(rageval)= # RAGEval :::{toctree} :hidden: mteb.md clip_benchmark.md ragas.md ::: This project supports independent evaluation and end-to-end evaluation for RAG and multimodal RAG: - **Independent Evaluation**: Evaluating the retrieval module separately. The evaluation metrics for the retrieval module include **Hit Rate, Mean Reciprocal Rank (MRR), Normalized Discounted Cumulative Gain (NDCG), Precision**, etc. These metrics are used to measure the system's effectiveness in ranking items based on a query or task. - **End-to-End Evaluation**: Evaluating the final response generated by the RAG model for a given input. This includes the relevance and alignment of the model-generated answer with the input query. From the content generation objective perspective, the evaluation can be divided into **no-reference** and **reference-based** evaluations: No-reference evaluation metrics include **Context Relevance, Faithfulness**, etc.; Reference-based evaluation metrics include **Accuracy, BLEU, ROUGE**, etc. ```{seealso} Related research on RAG evaluation [here](../../../blog/RAG/RAG_Evaluation.md) ``` This framework supports the following: - Independent evaluation of the text retrieval module using [MTEB/CMTEB](mteb.md). - Independent evaluation of the multimodal image-text retrieval module using [CLIP Benchmark](clip_benchmark.md). - End-to-end generation evaluation of RAG and multimodal RAG using [RAGAS](ragas.md). ::::{grid} 3 :::{grid-item-card} MTEB/CMTEB :link: mteb :link-type: ref For independent evaluation of the retrieval module, supporting embedding models and reranker models. ::: :::{grid-item-card} CLIP Benchmark :link: clip_benchmark :link-type: ref For independent evaluation of the multimodal image-text retrieval module, supporting CLIP models. ::: :::{grid-item-card} RAGAS :link: ragas :link-type: ref For end-to-end generation evaluation of RAG and multimodal RAG, also supporting automatic generation of evaluation sets. ::: ::::