(rageval)=
# RAGEval
:::{toctree}
:hidden:
mteb.md
clip_benchmark.md
ragas.md
:::

This project supports independent evaluation and end-to-end evaluation for RAG and multimodal RAG:

- **Independent Evaluation**: Evaluating the retrieval module separately. The evaluation metrics for the retrieval module include **Hit Rate, Mean Reciprocal Rank (MRR), Normalized Discounted Cumulative Gain (NDCG), Precision**, etc. These metrics are used to measure the system's effectiveness in ranking items based on a query or task.

- **End-to-End Evaluation**: Evaluating the final response generated by the RAG model for a given input. This includes the relevance and alignment of the model-generated answer with the input query. From the content generation objective perspective, the evaluation can be divided into **no-reference** and **reference-based** evaluations: No-reference evaluation metrics include **Context Relevance, Faithfulness**, etc.; Reference-based evaluation metrics include **Accuracy, BLEU, ROUGE**, etc.

```{seealso}
Related research on RAG evaluation [here](../../../blog/RAG/RAG_Evaluation.md)
```

This framework supports the following:
- Independent evaluation of the text retrieval module using [MTEB/CMTEB](mteb.md).
- Independent evaluation of the multimodal image-text retrieval module using [CLIP Benchmark](clip_benchmark.md).
- End-to-end generation evaluation of RAG and multimodal RAG using [RAGAS](ragas.md).

::::{grid} 3
:::{grid-item-card} MTEB/CMTEB
:link: mteb
:link-type: ref

For independent evaluation of the retrieval module, supporting embedding models and reranker models.
:::

:::{grid-item-card} CLIP Benchmark
:link: clip_benchmark
:link-type: ref

For independent evaluation of the multimodal image-text retrieval module, supporting CLIP models.
:::

:::{grid-item-card} RAGAS
:link: ragas
:link-type: ref

For end-to-end generation evaluation of RAG and multimodal RAG, also supporting automatic generation of evaluation sets.
:::
::::