A demo of Cache-Augmented Generation (CAG) using Mistral-7B.
CAG preloads relevant knowledge into a language model's context, allowing for faster and more efficient question-answering without real-time document retrieval.
Google Colab: https://colab.research.google.com/drive/1-0eKIu6cGAZ47ROKQaF6EU-mHtvJBILV?usp=sharing
The notebook cag_demo.ipynb
showcases the core steps of CAG:
- Loading the Mistral model and tokenizer
- Reading a local
document.txt
file containing information about you (For this case, me / Ronan Takizawa) - Preloading that knowledge into the model's context using a
DynamicCache
- Answering user queries by referencing the cached knowledge, without real-time retrieval
- Python 3.7+
- PyTorch 1.13+
- Transformers 4.28+
- Hugging Face account with access to the
mistralai/Mistral-7B-Instruct-v0.1
model
- Clone this repository:
git clone https://github.com/yourusername/cache-augmented-generation.git
cd cache-augmented-generation
- Install the required packages:
pip install torch transformers
- Create a
document.txt
file in the project directory, containing the knowledge you want to preload
-
Open the
cag_demo.ipynb
notebook in Jupyter, VS Code, or Google Colab. -
Run the cells in order. The notebook will:
- Load the Mistral model and tokenizer
- Read
document.txt
and preload its content into aDynamicCache
- Ask two example questions about Ronan Takizawa, answering them using the cached knowledge
- Observe the model's responses, which are generated without real-time document retrieval.
To use your own knowledge base:
- Replace the content of
document.txt
with your desired information. - Adjust the example questions in the notebook to match your knowledge domain.