39 lines
1.7 KiB
Markdown
39 lines
1.7 KiB
Markdown
## Evaluation for MLVU
|
|
|
|
We provide detailed evaluation methods for MLVU, including Multiple-choice tasks and generation tasks.
|
|
|
|
### Benchmark MLVU on your Model
|
|
Firstly, If you want to benchmark MLVU in your models, you can refer to our template test code as follows:
|
|
#### Multiple-Choice testing
|
|
```
|
|
python multiple_choice_evaluation/choice_bench.py
|
|
```
|
|
You must load your model into this template and evaluate the multiple-choice performance online.
|
|
#### Generation testing
|
|
- Step 1 Get the inference results of Sub-Scene Captioning and Video Summary.
|
|
```
|
|
python generation_evaluation/open_bench.py
|
|
```
|
|
- Step 2 Run the evaluation for the generation tasks.
|
|
For Sub-Scene Captioning, modify your pred_path (by step 1) and output_dir then run
|
|
```
|
|
python evaluate_ssc.py --pred_path /your_path/subplot_all.json --output_dir /eval_subplot --output_json /eval_subplot.json
|
|
python calculate.py --path /eval_subplot
|
|
```
|
|
For Video Summarization, modify your pred_path (by step 1) and output_dir then run
|
|
```
|
|
python evaluate_summary.py --pred_path /your_path/summary_all.json --output_dir /eval_summary --output_json /eval_summary.json
|
|
```
|
|
Then run, and you need to modify the path in it to your output_dir
|
|
```
|
|
python calculate_sum.py --path /eval_summary
|
|
```
|
|
|
|
### Benchmark MLVU on existing models
|
|
(Take VideoChat2 as an example:)
|
|
- step 1: Download original models as well as weights from [VideoChat2](https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat2)
|
|
- step 2: Put choice_bench.py and open_bench.py into the folder as the same as demo.py
|
|
- step 3: modify your path of the MLVU in choice_bench.py and open_bench.py
|
|
- step 4: run the inference and online evaluation for Multiple-choice tasks.
|
|
- step 5: run the inference and evaluation for the generation tasks.
|