This is a generative quesiton answering chatbot
The following is it's workflow:
- The user gives its input (ideally a query)
- The relavant pre-indexed documents are fetched from
pinecone
- The transformer model generates a long form answer based on the question and context.
- The model used for text generation is
vblagoje/bart_lfqa
from huggingface. - The model used for computing embeddings for indexing is
flax-sentence-embeddings/all_datasets_v3_mpnet-base
from huggingface. - Pinecone is used for storing and searching through embeddings.
Create a virtual environment (highly suggested)
python -m venv ven
Activate the enviroment
- Linux
source venv/bin/activate
- Windows
.\venv\Scripts\activate
Install the required libraries
pip install -r requirements.txt
- Create an account with pinecone and get a pinecone_api_key
- assign your pinecone_api_key to
PINECONE_API_KEY
in indexing.py - Put all you documents in documents.txt. There should be an empty line between each document.
- Then run
python indexing.py
- assign your pinecone_api_key to
PINECONE_API_KEY
in bot.py - assign your telegram_bot_token to
TELEGRAM_BOT_TOKEN
in bot.py - Then run
python bot.py
Note: The documents in my documents.txt
are generated by ChatGPT. Hence, they might not be factual.