Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Handle Tied Distances in k-Nearest Neighbor Search #143

Open
rw9018 opened this issue Nov 20, 2024 · 1 comment
Open

Feature Request: Handle Tied Distances in k-Nearest Neighbor Search #143

rw9018 opened this issue Nov 20, 2024 · 1 comment

Comments

@rw9018
Copy link

rw9018 commented Nov 20, 2024

In scenarios where multiple neighbors have the same distance to a query point (ties), the current implementation deterministically returns the same k neighbors based on the tree traversal or data ordering. This is problematic for datasets where ties are common, such as those with duplicate points or rounded values.

Problem:
When ties occur and the number of tied neighbors exceeds k, the library always selects the same k neighbors. This can lead to biased results and limits diversity in downstream applications like stochastic modeling or simulations.

For example:
A query point with 1000 equidistant neighbors (distance 0) but k = 10 will always return the same 10 neighbors.
Users cannot randomize neighbor selection among tied points, which reduces flexibility and fairness.

Proposed Solution:
Introduce an option to handle ties during k-NN queries:

  1. Detect tied distances among neighbors.
  2. Randomly select k neighbors from the tied group when ties occur.
  3. Add an optional parameter (e.g., resolve_ties = TRUE) to enable or disable this behavior.

Why This Matters:
Tied distances are common in real-world datasets due to:

  1. Exact matches or duplicate points.
  2. Rounded or discretized data.
  3. Without tie handling, deterministic selection introduces bias and limits the applicability of libnabo for use cases requiring diverse or randomized neighbor selection.

Request:
Would it be possible to add support for resolving ties as an optional feature in libnabo? This would improve the library’s flexibility and usability for datasets where ties frequently occur.

Thank you for your work on this library!

@boxanm
Copy link
Contributor

boxanm commented Dec 9, 2024

Hello @rw9018 !
This is an interesting request! Would you mind sharing what field you are in and what type of datasets you work with? We mainly employ libnabo to find nearest neighbors for ICP for mobile robot SLAM, and from my experience, it is very unlikely to have tie distances in any real-life data. At the same time, we need the correspondence search to be as fast as possible to support real-time applications, and adding additional complexity could introduce new delays.
We are open to discussions about the proposed feature, but unfortunately, we won't be able to support its development. Nonetheless, if you decide to implement it yourself, I'll happily process any PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants