Feature Request: Handle Tied Distances in k-Nearest Neighbor Search #143

rw9018 · 2024-11-20T09:37:15Z

In scenarios where multiple neighbors have the same distance to a query point (ties), the current implementation deterministically returns the same k neighbors based on the tree traversal or data ordering. This is problematic for datasets where ties are common, such as those with duplicate points or rounded values.

Problem:
When ties occur and the number of tied neighbors exceeds k, the library always selects the same k neighbors. This can lead to biased results and limits diversity in downstream applications like stochastic modeling or simulations.

For example:
A query point with 1000 equidistant neighbors (distance 0) but k = 10 will always return the same 10 neighbors.
Users cannot randomize neighbor selection among tied points, which reduces flexibility and fairness.

Proposed Solution:
Introduce an option to handle ties during k-NN queries:

Detect tied distances among neighbors.
Randomly select k neighbors from the tied group when ties occur.
Add an optional parameter (e.g., resolve_ties = TRUE) to enable or disable this behavior.

Why This Matters:
Tied distances are common in real-world datasets due to:

Exact matches or duplicate points.
Rounded or discretized data.
Without tie handling, deterministic selection introduces bias and limits the applicability of libnabo for use cases requiring diverse or randomized neighbor selection.

Request:
Would it be possible to add support for resolving ties as an optional feature in libnabo? This would improve the library’s flexibility and usability for datasets where ties frequently occur.

Thank you for your work on this library!

boxanm · 2024-12-09T16:02:56Z

Hello @rw9018 !
This is an interesting request! Would you mind sharing what field you are in and what type of datasets you work with? We mainly employ libnabo to find nearest neighbors for ICP for mobile robot SLAM, and from my experience, it is very unlikely to have tie distances in any real-life data. At the same time, we need the correspondence search to be as fast as possible to support real-time applications, and adding additional complexity could introduce new delays.
We are open to discussions about the proposed feature, but unfortunately, we won't be able to support its development. Nonetheless, if you decide to implement it yourself, I'll happily process any PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Handle Tied Distances in k-Nearest Neighbor Search #143

Feature Request: Handle Tied Distances in k-Nearest Neighbor Search #143

rw9018 commented Nov 20, 2024

boxanm commented Dec 9, 2024

Feature Request: Handle Tied Distances in k-Nearest Neighbor Search #143

Feature Request: Handle Tied Distances in k-Nearest Neighbor Search #143

Comments

rw9018 commented Nov 20, 2024

boxanm commented Dec 9, 2024