-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Parallelize Exact Search for vector indices in a segment #2326
Comments
|
Yes we can but we need a way to tell threads from which docId to which Id it needs to do the scoring. Hence I was thinking if we just accumulate DocIds in batches and then keep on handing over to threads for scoring while we iterate over the docIds that might be easy to implement. But I like the suggestion. |
The thread will process whatever available next doc id from the iterator until they meet no_more_doc. |
Ahhh, now i know what you saying. This is pretty interesting. I think we can build such iterator with some locks and synchronized blocks. I think this will be more easy to build and validate. Thanks @heemin32 for the suggestion. May be there can be other ways to optimize the whole code but I like what you are thinking. |
@navneet1v @heemin32 One of the ideas I have is to use IndexInput#prefetch. Current exact search uses FlatVectors which are configured to have random access, this increases the cost of disk IO. Before attempting multithreading we should try to use prefetch and benchmark it. It needs changes in Lucene as FloatVectorValues will need a prefetch API. We can reconsider parallelizing after we benchmark prefetch. Prefetch is much more simpler to implement and maintain than multi-threading. I don't expect parallel searches to be affected heavily if its a controlled prefetch. Thoughts? |
Description
Currently at a segment level when vector search is happening there are multiple cases when we move from ANN search to exact search. Cases like when filter threshold hit or if there are no graphs present on the segment due to graph creation threshold is not met for a segment. In all these cases we do exact search but while doing exact search we serially iterate over valid docIds, read the vectors and then compute the score.
Possible Optimizations
One possible optimization we can do is to actually parallelize this, which can give a reduce the latency.
Some ideas can be:
I think 1 and 3, should be better over 2 as in that case we don't need to read all vectors in heap and put more stress on heap. Looking for feedbacks here.
But during implementation we can see which one we should choose.
Similar thing can be done for the script score based exact search but that will need some more thinking. Will try to update add that too.
The text was updated successfully, but these errors were encountered: