From fd859a68276ac0e129e3b3c17afefe3c9f77b2ee Mon Sep 17 00:00:00 2001 From: Jeffrey Ip Date: Mon, 2 Dec 2024 17:49:03 +0800 Subject: [PATCH] Updated docs --- docs/docs/metrics-ragas.mdx | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/docs/docs/metrics-ragas.mdx b/docs/docs/metrics-ragas.mdx index 93e13637..4b0bc41d 100644 --- a/docs/docs/metrics-ragas.mdx +++ b/docs/docs/metrics-ragas.mdx @@ -13,8 +13,18 @@ The RAGAS metric is the average of four distinct metrics: It provides a score to holistically evaluate of your RAG pipeline's generator and retriever. -:::note -The `RAGASMetric`, although similar to `deepeval`'s default RAG metrics, is not capable of generating a reason. +:::warning WHAT'S THE DIFFERENCE? +The `RAGASMetric` uses the `ragas` library under the hood and are available on `deepeval` with the intention to allow users of `deepeval` can have access to `ragas` in `deepeval`'s ecosystem as well. + +They are implemented in almost an identical way to `deepeval`'s default RAG metrics, and however there are a few differences, including but not limited to: + +- `deepeval`'s RAG metrics generates a reason that corresponds to the score equation. Although both `ragas` and `deepeval` has equations attached to their default metrics, `deepeval` incorperates an LLM judges' reasoning along the way. +- `deepeval`'s RAG metrics are debuggable - meaning you can inspect the LLM judges' judgements along the way to see why the score is a certain way. +- `deepeval`'s RAG metrics are JSON confineable. You'll often meet `NaN` scores in `ragas` because of invalid JSONs generated - but `deepeval` offers a way for you to use literally any custom LLM for evaluation and [JSON confine them in a few lines of code.](guides-using-custom-llms) +- `deepeval`'s RAG metrics integrates **fully** with `deepeval`'s ecosystem. This means you'll get access to metrics caching, native support for `pytest` integrations, first-class error handling, available on Confident AI, and so much more. + +Due to these reasons, we highly recommend that you use `deepeval`'s RAG metrics instead. They're proven to work, and if not better according to [examples shown in some studies.](https://arxiv.org/pdf/2409.06595) + ::: ## Required Arguments @@ -28,6 +38,14 @@ To use the `RagasMetric`, you'll have to provide the following arguments when cr ## Example +First, install `ragas`: + +```console +pip install ragas +``` + +Then, use it within `deepeval`: + ```python from deepeval import evaluate from deepeval.metrics.ragas import RagasMetric