1.3 KiB
1.3 KiB
DataSet
This will point to the training data we use for training various models.
| Dataset | Introduction |
|---|---|
| MLDR | Document Retrieval Dataset, covering 13 languages |
| bge-m3-data | Fine-tuning data used by bge-m3 |
| public-data | Public data identical to e5-mistral |
| full-data | The full dataset we used for training bge-en-icl |
| bge-multilingual-gemma2-data | The full multilingual dataset we used for training bge-multilingual-gemma2 |
| reranker-data | a mixture of multilingual datasets |