embed-bge-m3/FlagEmbedding/dataset
hailin cb54502fae first commit 2025-08-04 11:58:13 +00:00
..
README.md first commit 2025-08-04 11:58:13 +00:00

README.md

DataSet

This will point to the training data we use for training various models.

Dataset Introduction
MLDR Document Retrieval Dataset, covering 13 languages
bge-m3-data Fine-tuning data used by bge-m3
public-data Public data identical to e5-mistral
full-data The full dataset we used for training bge-en-icl
bge-multilingual-gemma2-data The full multilingual dataset we used for training bge-multilingual-gemma2
reranker-data a mixture of multilingual datasets