embed-bge-m3/FlagEmbedding/research/Matroyshka_reranker/README.md

261 lines
7.5 KiB
Markdown

<div align="center">
<h1> Fitting Into Any Shape: A Flexible LLM-Based Re-Ranker With Configurable Depth and Width (Matroyshka Re-Ranker) [<a href="https://dl.acm.org/doi/abs/10.1145/3696410.3714620">paper</a>]</h1>
</div>
Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. And the score can be mapped to a float value in [0,1] by sigmoid function.
Here is **Matroyshka Re-Ranker**, which is designed to facilitate **runtime customization** of model layers and sequence lengths at each layer based on users' configurations, it supports flexible lightweight configuration.
The training method have the following features:
- cascaded self-distillation
- factorized compensation
## Environment
You can install the environment by:
```bash
conda create -n reranker python=3.10
conda activate reranker
pip install -r requirements.txt
```
## Model List
| Model | Introduction |
| ------------------------------------------------------------ | --------------------------------------------------------- |
| [BAAI/Matroyshka-ReRanker-passage](https://huggingface.co/BAAI/Matroyshka-ReRanker-passage) | The Matroyshka Re-Ranker fine-tuned on MS MARCO passage |
| [BAAI/Matroyshka-ReRanker-document](https://huggingface.co/BAAI/Matroyshka-ReRanker-document) | The Matroyshka Re-Ranker fine-tuned on MS MARCO document |
| [BAAI/Matroyshka-ReRanker-beir](https://huggingface.co/BAAI/Matroyshka-ReRanker-beir) | The Matroyshka Re-Ranker fine-tuned for general retrieval |
### Usage
You can use Matroyshka Re-Ranker with the following code:
```bash
cd ./inference
python
```
And then:
```python
from rank_model import MatroyshkaReranker
compress_ratio = 2 # config your compress ratio
compress_layers = [8, 16] # cofig your layers to compress
cutoff_layers = [20, 24] # config your layers to output
reranker = MatroyshkaReranker(
model_name_or_path='BAAI/Matroyshka-ReRanker-passage',
peft_path=[
'./models/Matroyshka-ReRanker-passage/compensate/layer/full'
]
use_fp16=True,
cache_dir='./model_cache',
compress_ratio=compress_ratio,
compress_layers=compress_layers,
cutoff_layers=cutoff_layers
)
score = reranker.compute_score(['query', 'passage'])
print(score)
scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']])
print(scores)
```
## Fine-tune
### Cascaded Self-distillation
For cascaded self-distillation, you can use the following script:
```bash
cd self_distillation
train_data_path="..."
your_huggingface_token="..."
torchrun --nproc_per_node 8 \
run.py \
--output_dir ./result_self_distillation \
--model_name_or_path mistralai/Mistral-7B-v0.1 \
--train_data ${train_data_path} \
--learning_rate 2e-4 \
--num_train_epochs 1 \
--per_device_train_batch_size 4 \
--gradient_accumulation_steps 4 \
--dataloader_drop_last True \
--query_max_len 32 \
--passage_max_len 192 \
--train_group_size 16 \
--logging_steps 1 \
--save_steps 100 \
--save_total_limit 50 \
--ddp_find_unused_parameters False \
--gradient_checkpointing \
--deepspeed /share/chaofan/code/stage/stage1.json \
--warmup_ratio 0.1 \
--bf16 \
--use_lora True \
--lora_rank 32 \
--lora_alpha 64 \
--loss_type 'only logits' \
--use_flash_attn False \
--target_modules q_proj k_proj v_proj o_proj down_proj up_proj gate_proj linear_head \
--token ${your_huggingface_token} \
--cache_dir ../../model_cache \
--cache_path ../../data_cache \
--padding_side right \
--start_layer 4 \
--layer_sep 1 \
--layer_wise True \
--compress_ratios 1 2 4 8 \
--compress_layers 4 8 12 16 20 24 28 \
--train_method distill_fix_layer_teacher
```
- ### Factorized Compensation
For layer compensation, you can use the following script:
```bash
cd finetune/compensation
train_data_path="..."
your_huggingface_token="..."
raw_peft_path="../../self_distillation/result_self_distillation"
torchrun --nproc_per_node 8 \
run.py \
--output_dir ./result_compensation_layer \
--model_name_or_path mistralai/Mistral-7B-v0.1 \
--raw_peft ${raw_peft_path} \
--train_data ${train_data_path} \
--learning_rate 2e-5 \
--num_train_epochs 1 \
--per_device_train_batch_size 4 \
--gradient_accumulation_steps 4 \
--dataloader_drop_last True \
--query_max_len 32 \
--passage_max_len 192 \
--train_group_size 16 \
--logging_steps 1 \
--save_steps 500 \
--save_total_limit 50 \
--ddp_find_unused_parameters False \
--gradient_checkpointing \
--deepspeed stage1.json \
--warmup_ratio 0.1 \
--bf16 \
--use_lora True \
--lora_rank 32 \
--lora_alpha 64 \
--loss_type 'only logits' \
--use_flash_attn False \
--target_modules q_proj k_proj v_proj o_proj down_proj up_proj gate_proj linear_head \
--token ${your_huggingface_token} \
--cache_dir ../../model_cache \
--cache_path ../../data_cache \
--padding_side right \
--start_layer 4 \
--layer_sep 1 \
--layer_wise True \
--compress_ratios 1 \
--compress_layers 4 8 12 16 20 24 28 \
--train_method normal \
--finetune_type layer
```
For token compression, you can use the following script:
```bash
cd finetune/compensation
train_data_path="..."
your_huggingface_token="..."
raw_peft_path="../../self_distillation/result_self_distillation"
compress_ratio=2
torchrun --nproc_per_node 8 \
run.py \
--output_dir ./result_compensation_token_compress_ratio_${compress_ratio} \
--model_name_or_path mistralai/Mistral-7B-v0.1 \
--raw_peft ${raw_peft_path} \
--train_data ${train_data_path} \
--learning_rate 2e-5 \
--num_train_epochs 1 \
--per_device_train_batch_size 4 \
--gradient_accumulation_steps 4 \
--dataloader_drop_last True \
--query_max_len 32 \
--passage_max_len 192 \
--train_group_size 16 \
--logging_steps 1 \
--save_steps 500 \
--save_total_limit 50 \
--ddp_find_unused_parameters False \
--gradient_checkpointing \
--deepspeed stage1.json \
--warmup_ratio 0.1 \
--bf16 \
--use_lora True \
--lora_rank 32 \
--lora_alpha 64 \
--loss_type 'only logits' \
--use_flash_attn False \
--target_modules q_proj k_proj v_proj o_proj down_proj up_proj gate_proj linear_head \
--token ${your_huggingface_token} \
--cache_dir ../../model_cache \
--cache_path ../../data_cache \
--padding_side right \
--start_layer 4 \
--layer_sep 1 \
--layer_wise True \
--compress_ratios ${compress_ratio} \
--compress_layers 4 8 12 16 20 24 28 \
--train_method normal \
--finetune_type token
```
### Inference
You can use self finetuned Matroyshka Re-Ranker with the following code:
```bash
cd ./inference
python
```
And then:
```python
from rank_model import MatroyshkaReranker
compress_ratio = 2 # config your compress ratio
compress_layers = [8, 16] # cofig your layers to compress
cutoff_layers = [20, 24] # config your layers to output
reranker = MatroyshkaReranker(
model_name_or_path='mistralai/Mistral-7B-v0.1',
peft_path=[
'./finetune/self_distillation/result_self_distillation',
'./finetune/compensation/result_compensation_token_compress_ratio_2',
],
use_fp16=True,
cache_dir='./model_cache',
compress_ratio=compress_ratio,
compress_layers=compress_layers,
cutoff_layers=cutoff_layers
)
score = reranker.compute_score(['query', 'passage'])
print(score)
scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']])
print(scores)
```
###