Skip to content

mia-platform/ai-rag-template

Repository files navigation

ai-rag-template

Python version FastAPI version


ai-rag-template is a template meant to be a based for the implementation of a RAG (retrieval augmented generation) system.
This repository contains the backend code, which consists of a web server that provides REST APIs to primarily support one type of operation:

  • Chat: Provides a conversation feature, allowing users to ask questions and get responses from the chatbot.

The backend was developed using the LangChain framework, which enables creating sequences of complex interactions using Large Language Models. The web server was implemented using the FastAPI framework.

More information on how the service works can be found in the Overview and Usage page.

Main Features

When running the service, the application exposes a Swagger UI at the /docs endpoint.

Chat Endpoint (/chat/completions)

The /chat/completions endpoint generates responses to user queries based on provided context and chat history. It leverages information from the configured Vector Store to formulate relevant responses, enhancing the conversational experience.

Example:

Request
curl 'http://localhost:3000/chat/completions' \
  -H 'content-type: application/json' \
  --data-raw '{"chat_query":"Design a CRUD schema for an online store selling merchandise items","chat_history":[]}'
Response
{
    "message": "For an online store selling merchandise items, we can design a CRUD schema for a `Product` entity with the following properties: ...",
    "references": [
        {
            "content": "### Create CRUD to Read and Write Table Data  \n...",
            "url": "https://docs.mia-platform.eu/docs/microfrontend-composer/tutorials/basics"
        },
        {
            "content": "### Create CRUD to Read and Write Table Data  \n...",
            "url": "https://docs.mia-platform.eu/docs/microfrontend-composer/tutorials/basics"
        },
        {
            "content": "### Create a CRUD for persistency  \n...",
            "url": "https://docs.mia-platform.eu/docs/console/tutorials/configure-marketplace-components/flow-manager"
        },
        {
            "content": "### Create a CRUD for persistency  \n...",
            "url": "https://docs.mia-platform.eu/docs/console/tutorials/configure-marketplace-components/flow-manager"
        }
    ]
}

Embedding Endpoints

Generate from website (/embeddings/generate)

The /embeddings/generate endpoint is a HTTP POST method that takes as input:

  • url (string, required), a web URL used as a starting point

  • filterPath (string, not required), a more specific web URL that the one specified above

  • crawl the webpage

  • check for links on the same domain (and, if included, that begins with the filterPath) of the webpage and store them in a list

  • scrape the page for text

  • generate the embeddings using the configured embedding model

  • start again from every link still in the list

NOTE: This method can be run only one at a time, as it uses a lock to prevent multiple requests from starting the process at the same time.

No information are returned when the process ends, either as completed or stopped because of an error.

Eg:

Request
curl 'http://localhost:3000/embedding/generate' \
  -H 'content-type: application/json' \
  --data-raw '{"url":"https://docs.mia-platform.eu/", "domain": "https://docs.mia-platform.eu/docs/runtime_suite_templates" }'
Response in case the runner is idle
200 OK
{
    "statusOk": "true"
}
Response in case the runner is running
409 Conflict
{
    "detail": "A process to generate embeddings is already in progress." 
}

Generate from file (/embeddings/generateFromFile)

The /embeddings/generateFromFile endpoint is a HTTP POST method that takes as input:

  • file (binary, required), a file to be uploaded containing the text that will be transformed into embeddings.

The file must be of format:

  • a text file (.txt)
  • a markdown file (.md)
  • a PDF file (.pdf)
  • a zip file (formats available: .zip, .tar, .gz) containing files of the same formats as above (folders and other files will be skipped).

For this file, of each file inside the archive, the text will be retrieved, chunked and the embeddings generated.

NOTE: This method can be run only one at a time, as it uses a lock to prevent multiple requests from starting the process at the same time.

No information are returned when the process ends, either as completed or stopped because of an error.

Eg:

Request
curl -X 'POST' \
  'https://rag-app-test.console.gcp.mia-platform.eu/api/embeddings/generateFromFile' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F '[email protected];type=application/zip'
Response in case the runner is idle
200 OK
{
    "statusOk": "true"
}
Response in case the runner is running
409 Conflict
{
    "detail": "A process to generate embeddings is already in progress." 
}

Generation status (/embeddings/status)

This request returns to the user information regarding the embeddings generation runner. Could be either idle (no process currently running) or running (a process of generating embeddings is actually happenning).

Eg:

Request
curl 'http://localhost:3000/embedding/status' \
  -H 'content-type: application/json' \
Response
200 OK
{
    "status": "idle"
}

Metrics Endpoint (/-/metrics)

The /-/metrics endpoint exposes the metrics collected by Prometheus.

High Level Architecture

The following is the high-level architecture of ai-rag-template.

flowchart LR
  fe[Frontend]
  be[Backend]
  vs[(Vector Store)]
  llm[LLM API]
  eg[Embeddings Generator API]

  fe --1. user question +\nchat history--> be
  be --2. user question--> eg
  eg --3. embedding-->be
  be --4. similarity search-->vs
  vs --5. similar docs-->be
  be --6. user question +\nchat history +\nsimilar docs-->llm
  llm --7. bot answer--> be
  be --8. bot answer--> fe
Loading

Vector Index

The application will check if the collection includes at Vector Search Index at startup. If it does not find it, it will create a new one. If there's already one, it will try to update if notices that there any difference between the existing one and the one based on the values included in the configuration file.

The Vector Search Index will have the following structure:

{
  "fields": [
    {
      "numDimensions": <numDimensions>,
      "path": "<embeddingKey>",
      "similarity": "<relevanceScoreFn>",
      "type": "vector"
    }
  ]
}

The values numDimensions, embeddingKey and relevanceScoreFn comes from the configuration file. While embeddingKey and relevanceScoreFn comes exactly from the values included in the file, the numDimensions depends on the Embedding Model used (supported: text-embedding-3-small and text-embedding-3-large).

NOTE

In the event that an error occurs during the creation or update of the Vector Index, the exception will be logged, but the application will still start. However, the functioning of the service is not guaranteed.

Configuration

The service requires several configuration parameters for execution. Below is an example of configuration:

{
  "llm": {
    "type": "openai",
    "name": "gpt-3.5-turbo",
    "temperature": 0.7,
  },
  "embeddings": {
    "type": "openai",
    "name": "text-embedding-3-small"
  },
  "vectorStore": {
    "dbName": "database-test",
    "collectionName": "assistant-documents",
    "indexName": "vector_index",
    "relevanceScoreFn": "euclidean",
    "embeddingKey": "embedding",
    "textKey": "text",
    "maxDocumentsToRetrieve": 4,
    "minScoreDistance": 0.5
  },
  "chain": {
    "aggregateMaxTokenNumber": 2000,
    "rag": {
      "promptsFilePath": {
        "system": "/path/to/system-prompt.txt",
        "user": "/path/to/user-prompt.txt"
      }
    }
  }
}

Description of configuration parameters:

Param Name Description
LLM Type Identifier of the provider to use for the LLM. Default: openai. See more in Supported LLM providers
LLM Name Name of the chat model to use. Must be supported by LangChain.
LLM Temperature Temperature parameter for the LLM, intended as the grade of variability and randomness of the generated response. Default: 0.7 (suggested value).
Embeddings Type Identifier of the provider to use for the Embeddings. Default: openai. See more in Supported Embeddings providers
Embeddings Name Name of the encoder to use. Must be supported by LangChain.
Vector Store DB Name Name of the MongoDB database to use as a knowledge base.
Vector Store Collection Name Name of the MongoDB collection to use for storing documents and document embeddings.
Vector Store Index Name Name of the vector index to use for retrieving documents related to the user's query. The application will check at startup if a vector index with this name exists, it needs to be updated or needs to be created.
Vector Store Relevance Score Function Name of the similarity function used for extracting similar documents using the created vector index. In case the existing vector index uses a different similarity function, the index will be updated using this as a similarity function.
Vector Store Embeddings Key Name of the field used to save the semantic encoding of documents. In case the existing vector index uses a different key to store the embedding in the collection, the index will be updated using this as key. Please mind that any change of this value might require to recreate the embeddings.
Vector Store Text Key Name of the field used to save the raw document (or chunk of document).
Vector Store Max. Documents To Retrieve Maximum number of documents to retrieve from the Vector Store.
Vector Store Min. Score Distance Minimum distance beyond which retrieved documents from the Vector Store are discarded.
Chain RAG System Prompts File Path ath to the file containing system prompts for the RAG model. If omitted, the application will use a standard system prompt.
Chain RAG User Prompts File Path Path to the file containing user prompts for the RAG model. If omitted, the application will use a standard system prompt.

Supported LLM providers

The property type inside the llm object of the configuration should be one of the supported providers for the LLM. Currently, the supported LLM providers are:

  • OpenAI (openai), in which case the llm configuration could be the following:

    {
      "type": "openai",
      "name": "gpt-3.5-turbo",
      "temperature": 0.7,
    }

    with the properties explained above.

  • Azure OpenAI (azure), in which case the llm configuration could be the following:

    {
      "type": "azure",
      "name": "gpt-3.5-turbo",
      "deploymentName": "dep-gpt-3.5-turbo",
      "url": "https://my-company.openai.azure.com/",
      "apiVersion": "my-azure-api-version",
      "temperature": 0.7
    }

    While, type is always azure, and name and temperature have been already explained, the other properties are:

    Name Description
    deploymentName Name of the deployment to use.
    url URL of the Azure OpenAI service to call.
    apiVersion API version of the Azure OpenAI service.

Supported Embeddings providers

The property type inside the embeddings object of the configuration should be one of the supported providers for the Embeddings. Currently, the supported Embeddings providers are:

  • OpenAI (openai), in which case the embeddings configuration could be the following:

    {
      "type": "openai",
      "name": "text-embedding-3-small",
    }

    with the properties explained above.

    • Azure OpenAI (azure), in which case the embeddings configuration could be the following:
    {
      "type": "azure",
      "name": "text-embedding-3-small",
      "deploymentName": "dep-text-embedding-3-small",
      "url": "https://my-company.openai.azure.com/",
      "apiVersion": "my-azure-api-version"
    }

    While, type is always azure, and name have been already explained, the other properties are:

    Name Description
    deploymentName Name of the deployment to use.
    url URL of the Azure OpenAI service to call.
    apiVersion API version of the Azure OpenAI service.

Local Development

  • Before getting started, make sure you have the following information:

    • A valid connection string to connect to MongoDB Atlas
    • An OpenAI API Key to generate embeddings and contact the chat model (it's better to use two different keys)
  • Copy the sample environment variables into a file used for development and replace the placeholders with your own values. As example you can create a file called local.env from default.env with the following command:

cp default.env local.env
  • Modify the values of the environment variables in the newly created file (for more info, refer to the Overview and Usage page)
  • Create a configuration file located in the path defined as the CONFIGURATION_PATH value in the environment variables file. As example, you can copy the default.configuration.json file into a new file called local.configuration.json with the following command:
cp default.configuration.json local.configuration.json
  • Modify the values of the configuration in the newly created file, accordingly to the definitions included in the Overview and Usage page.

Startup

  • Create a virtual environment to install project dependencies
python3 -m venv .venv
  • Activate the new virtual environment
source .venv/bin/activate
  • Install project dependencies
make install

You can run the web server with this command

# This uses the environment variable located to `local.env`
make start
# Or you can run:
dotenv -f <<YOUR_ENV_FILE>> run -- python -m src.app

You can reivew the API using the Swagger UI exposed at http://localhost:3000/docs

Contributing

To contribute to the project, please always create a branch for your updates and submit a Merge Request requesting approvals for one of the maintainers of the repository.

In order to push your commit, pre-commit operations are automatically executed to run unit tests and lint your code.

Unit tests

Ensure at any time that unit tests passes successfully. You can verify that via:

make test

Some of our tests includes snapshot, that can be updated via

make snapshot

NOTE: you might need to run make test again after updating the snapshots

Please make sure you include new tests or update the existing ones, according to the feature you are working on.

Lint

We use pylint as a linter. Please, try to follow the lint rules. You can run:

make lint

to make sure that code and tests follow our lint guidelines.

To fix any issue you can run

make lint-fix

or manually fix your code according to the errors and warning received.

Add new dependencies

You can add new dependencies, according to your needs, with the following command:

python -m pip install <<module_name>>

However, the package manager pip does not update automatically the list of dependencies included in the requirements.txt file. You have to do it by yourself with:

make freeze
# Or:
python -m pip freeze > requirements.txt

Startup with Docker

If you prefer Docker...

  • Build your image
docker build . -t ai-rag-template
  • Run the web server
docker run --env-file ./local.env -p 3000:3000 -d ai-rag-template