Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Parallelize Exact Search for vector indices in a segment #2326

Open
navneet1v opened this issue Dec 11, 2024 · 5 comments
Open

[FEATURE] Parallelize Exact Search for vector indices in a segment #2326

navneet1v opened this issue Dec 11, 2024 · 5 comments

Comments

@navneet1v
Copy link
Collaborator

navneet1v commented Dec 11, 2024

Description

Currently at a segment level when vector search is happening there are multiple cases when we move from ANN search to exact search. Cases like when filter threshold hit or if there are no graphs present on the segment due to graph creation threshold is not met for a segment. In all these cases we do exact search but while doing exact search we serially iterate over valid docIds, read the vectors and then compute the score.

Possible Optimizations

One possible optimization we can do is to actually parallelize this, which can give a reduce the latency.

Some ideas can be:

  1. We can actually gather all the docIds in an array and then let multiple threads consume those docIds with their VectorValues to read the vector and compute the scores in parallel.
  2. We can also first read the vectors and docsIds in heap and then compute scores in parallel.
  3. We can have a VectorValuesIterator that is thread safe and each thread can read from single iterator till the iterator reaches NO_MORE_DOCS. Thanks to @heemin32 for suggesting this.

I think 1 and 3, should be better over 2 as in that case we don't need to read all vectors in heap and put more stress on heap. Looking for feedbacks here.

But during implementation we can see which one we should choose.

Similar thing can be done for the script score based exact search but that will need some more thinking. Will try to update add that too.

@heemin32
Copy link
Collaborator

  1. We could have custom thread safe DocIdSetsIterator and let multiple thread to share the iterator to process.

@navneet1v
Copy link
Collaborator Author

We could have custom thread safe DocIdSetsIterator and let multiple thread to share the iterator to process.

Yes we can but we need a way to tell threads from which docId to which Id it needs to do the scoring. Hence I was thinking if we just accumulate DocIds in batches and then keep on handing over to threads for scoring while we iterate over the docIds that might be easy to implement. But I like the suggestion.

@heemin32
Copy link
Collaborator

We could have custom thread safe DocIdSetsIterator and let multiple thread to share the iterator to process.

Yes we can but we need a way to tell threads from which docId to which Id it needs to do the scoring. Hence I was thinking if we just accumulate DocIds in batches and then keep on handing over to threads for scoring while we iterate over the docIds that might be easy to implement. But I like the suggestion.

The thread will process whatever available next doc id from the iterator until they meet no_more_doc.

@navneet1v
Copy link
Collaborator Author

navneet1v commented Dec 11, 2024

We could have custom thread safe DocIdSetsIterator and let multiple thread to share the iterator to process.

Yes we can but we need a way to tell threads from which docId to which Id it needs to do the scoring. Hence I was thinking if we just accumulate DocIds in batches and then keep on handing over to threads for scoring while we iterate over the docIds that might be easy to implement. But I like the suggestion.

The thread will process whatever available next doc id from the iterator until they meet no_more_doc.

Ahhh, now i know what you saying. This is pretty interesting. I think we can build such iterator with some locks and synchronized blocks. I think this will be more easy to build and validate. Thanks @heemin32 for the suggestion. May be there can be other ways to optimize the whole code but I like what you are thinking.

@shatejas
Copy link
Collaborator

shatejas commented Jan 15, 2025

@navneet1v @heemin32 One of the ideas I have is to use IndexInput#prefetch. Current exact search uses FlatVectors which are configured to have random access, this increases the cost of disk IO. Before attempting multithreading we should try to use prefetch and benchmark it. It needs changes in Lucene as FloatVectorValues will need a prefetch API. We can reconsider parallelizing after we benchmark prefetch. Prefetch is much more simpler to implement and maintain than multi-threading.

I don't expect parallel searches to be affected heavily if its a controlled prefetch. Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

3 participants