Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faiss indexing file 'pyserini/indexes/dindex-wikipedia-dpr_multi-bf-20200127-f403c3.29eb39fe0b00a03c36c0eeae4c24f775' not found #2

Open
HiddenAlaska opened this issue Jul 27, 2024 · 5 comments

Comments

@HiddenAlaska
Copy link

Hi, really brilliant work to accelerate retrieval process in retrieval augumented language models!

I've been working on running through your open-source work so far, but come across some problems recently. It seems before runnning the commands listed in Readme.md, indexing files should be prebuild. Herein, I look into the build_hnsw_index.py file, which shows a prebuilt 'pyserini/indexes/dindex-wikipedia-dpr_multi-bf-20200127-f403c3.29eb39fe0b00a03c36c0eeae4c24f775' is needed.

Whereas, as a fresh man in this field, I haven't found any prebuild version named that in https://github.com/castorini/pyserini/blob/master/docs/usage-search.md#learned-dense-retrieval-models. The most relevant one is faiss.wikipedia-dpr-100w.dpr_multi.20200127.f403c3.tar.gz, but not so sure. Can you give a more specific description of that? About the prebuilt index, and the whole workflow to execute this project? Thanks a lot!

@HiddenAlaska
Copy link
Author

HiddenAlaska commented Jul 27, 2024

BTW, a large memory burden seems to exist. What's the lowest requirement of executing this project?

@JackFram
Copy link
Owner

Thanks for your interests in our work! Pyserini seems have updated their index since we released our initial code base so the index names might have been changed. I have checked their updated index and seems that the one you are referring to faiss.wikipedia-dpr-100w.dpr_multi.20200127.f403c3.tar.gz should be the one that you are looking for as the date and timestamps match with the original version. I feel like you can try using this one and replace the file name as well as the path in build_hnsw_index.py to build this hnsw index.

Also the memory requirement is indeed a bit large, I think for our setup, we uses a host memory size ~500GB to store the index files and for the knn experiments the memory requirement might be larger.

@HiddenAlaska
Copy link
Author

Thanks for your reply. Can you please provide a file, like requirements.txt to configure necessary environment? Again I suspect there is a version conflit in Faiss. Thanks again.

 /ralm/retrievers/dense_retrieval.py", line 95, in retrieve
    hit = self.searcher.doc(dsr.docid)
  File "/root/miniconda3/envs/ralmspec/lib/python3.10/site-packages/pyserini/search/faiss/_searcher.py", line 611, in doc
    return self.ssearcher.doc(docid) if self.ssearcher else None
AttributeError: 'FaissSearcher' object has no attribute 'ssearcher'. Did you mean: 'search'?

@JackFram
Copy link
Owner

Sry, as we don't actively maintain the code base right now so it's hard to provide a compatible requirements.txt. But can you verify that the prebuilt_index_name is specified when you build the index? Cause as in here the FaissSearcher object should have the ssearcher attribute if the prebuilt_index_name has been specified. In our code base the index_name is specified here. I can also try to find a pyserini version later that most close to our implementation if we find the current version has deviated a lot.

@JackFram
Copy link
Owner

JackFram commented Jul 30, 2024

I have checked the pyserini commit a year ago, the ssearcher attribute is not recently added and is initialized from the prebuilt_index_name. And this error log seems to indicate it is originated from the pyserini package as opposed to the Faiss package itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants