evalscope/docs/zh/advanced_guides/custom_dataset/vlm.md

191 lines
6.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 多模态大模型
本框架支持多模态选择题和问答题,两种预定义的数据集格式,使用流程如下:
````{note}
自定义数据集的评测需要使用`VLMEvalKit`,需要安装额外依赖:
```shell
pip install evalscope[vlmeval]
```
参考:[使用VLMEvalKit评测后端](../../user_guides/backend/vlmevalkit_backend.md)
````
## 选择题格式MCQ
### 1. 数据准备
评测指标为准确率accuracy需要定义如下格式的tsv文件使用`\t`分割):
```text
index category answer question A B C D image_path
1 Animals A What animal is this? Dog Cat Tiger Elephant /root/LMUData/images/custom_mcq/dog.jpg
2 Buildings D What building is this? School Hospital Park Museum /root/LMUData/images/custom_mcq/AMNH.jpg
3 Cities B Which city's skyline is this? New York Tokyo Shanghai Paris /root/LMUData/images/custom_mcq/tokyo.jpg
4 Vehicles C What is the brand of this car? BMW Audi Tesla Mercedes /root/LMUData/images/custom_mcq/tesla.jpg
5 Activities A What is the person in the picture doing? Running Swimming Reading Singing /root/LMUData/images/custom_mcq/running.jpg
```
其中:
- `index`为问题序号
- `question`为问题
- `answer`为答案
- `A`、`B`、`C`、`D`为选项,不得少于两个选项
- `answer`为答案选项
- `image_path`为图片路径(建议使用绝对路径);也可替换为`image`字段需为base64编码的图片
- `category`为类别(可选字段)
将该文件放在`~/LMUData`路径中,即可使用文件名来进行评测。例如该文件名为`custom_mcq.tsv`,则使用`custom_mcq`即可评测。
### 2. 配置文件
配置文件,可以为`python dict`、`yaml`或`json`格式,例如如下`config.yaml`文件:
```yaml
eval_backend: VLMEvalKit
eval_config:
model:
- type: qwen-vl-chat # 部署的模型名称
name: CustomAPIModel # 固定值
api_base: http://localhost:8000/v1/chat/completions
key: EMPTY
temperature: 0.0
img_size: -1
data:
- custom_mcq # 自定义数据集名称,放在`~/LMUData`路径中
mode: all
limit: 10
reuse: false
work_dir: outputs
nproc: 1
```
```{seealso}
VLMEvalKit[参数说明](../../user_guides/backend/vlmevalkit_backend.md#参数说明)
```
### 3. 运行评测
运行下面的代码,即可开始评测:
```python
from evalscope.run import run_task
run_task(task_cfg='config.yaml')
```
评测结果如下:
```text
---------- ----
split none
Overall 1.0
Activities 1.0
Animals 1.0
Buildings 1.0
Cities 1.0
Vehicles 1.0
---------- ----
```
## 自定义问答题格式VQA
### 1. 数据准备
准备一个问答题格式的tsv文件格式如下
```text
index answer question image_path
1 Dog What animal is this? /root/LMUData/images/custom_mcq/dog.jpg
2 Museum What building is this? /root/LMUData/images/custom_mcq/AMNH.jpg
3 Tokyo Which city's skyline is this? /root/LMUData/images/custom_mcq/tokyo.jpg
4 Tesla What is the brand of this car? /root/LMUData/images/custom_mcq/tesla.jpg
5 Running What is the person in the picture doing? /root/LMUData/images/custom_mcq/running.jpg
```
该文件与选择题格式相同,其中:
- `index`为问题序号
- `question`为问题
- `answer`为答案
- `image_path`为图片路径(建议使用绝对路径);也可替换为`image`字段需为base64编码的图片
将该文件放在`~/LMUData`路径中,即可使用文件名来进行评测。例如该文件名为`custom_vqa.tsv`,则使用`custom_vqa`即可评测。
### 2. 自定义评测脚本
以下是一个自定义数据集的示例,该示例实现了一个自定义的问答题格式的评测脚本,该脚本会自动加载数据集,并使用默认的提示进行问答,最后计算准确率作为评测指标。
```python
import os
import numpy as np
from vlmeval.dataset.image_base import ImageBaseDataset
from vlmeval.dataset.image_vqa import CustomVQADataset
from vlmeval.smp import load, dump, d2df
class CustomDataset:
def load_data(self, dataset):
# 自定义数据集的加载
data_path = os.path.join(os.path.expanduser("~/LMUData"), f'{dataset}.tsv')
return load(data_path)
def build_prompt(self, line):
msgs = ImageBaseDataset.build_prompt(self, line)
# 这里添加提示或自定义指令
msgs[-1]['value'] += '\n用一个单词或短语回答问题。'
return msgs
def evaluate(self, eval_file, **judge_kwargs):
data = load(eval_file)
assert 'answer' in data and 'prediction' in data
data['prediction'] = [str(x) for x in data['prediction']]
data['answer'] = [str(x) for x in data['answer']]
print(data)
# ========根据需要计算评测指标=========
# 精确匹配
result = np.mean(data['answer'] == data['prediction'])
ret = {'Overall': result}
ret = d2df(ret).round(2)
# 保存结果
suffix = eval_file.split('.')[-1]
result_file = eval_file.replace(f'.{suffix}', '_acc.csv')
dump(ret, result_file)
return ret
# ====================================
# 需保留以下代码,重写默认的数据集类
CustomVQADataset.load_data = CustomDataset.load_data
CustomVQADataset.build_prompt = CustomDataset.build_prompt
CustomVQADataset.evaluate = CustomDataset.evaluate
```
### 3. 配置文件
配置文件,可以为`python dict`、`yaml`或`json`格式,例如如下`config.yaml`文件:
```{code-block} yaml
:caption: config.yaml
eval_backend: VLMEvalKit
eval_config:
model:
- type: qwen-vl-chat
name: CustomAPIModel
api_base: http://localhost:8000/v1/chat/completions
key: EMPTY
temperature: 0.0
img_size: -1
data:
- custom_vqa # 自定义数据集名称,放在`~/LMUData`路径中
mode: all
limit: 10
reuse: false
work_dir: outputs
nproc: 1
```
### 4. 运行评测
完整评测脚本如下:
```{code-block} python
:emphasize-lines: 1
from custom_dataset import CustomDataset # 导入自定义数据集
from evalscope.run import run_task
run_task(task_cfg='config.yaml')
```
评测结果如下:
```text
{'qwen-vl-chat_custom_vqa_acc': {'Overall': '1.0'}}
```