191 lines
6.1 KiB
Markdown
191 lines
6.1 KiB
Markdown
# 多模态大模型
|
||
|
||
本框架支持多模态选择题和问答题,两种预定义的数据集格式,使用流程如下:
|
||
|
||
````{note}
|
||
自定义数据集的评测需要使用`VLMEvalKit`,需要安装额外依赖:
|
||
```shell
|
||
pip install evalscope[vlmeval]
|
||
```
|
||
参考:[使用VLMEvalKit评测后端](../../user_guides/backend/vlmevalkit_backend.md)
|
||
````
|
||
|
||
## 选择题格式(MCQ)
|
||
|
||
### 1. 数据准备
|
||
评测指标为准确率(accuracy),需要定义如下格式的tsv文件(使用`\t`分割):
|
||
```text
|
||
index category answer question A B C D image_path
|
||
1 Animals A What animal is this? Dog Cat Tiger Elephant /root/LMUData/images/custom_mcq/dog.jpg
|
||
2 Buildings D What building is this? School Hospital Park Museum /root/LMUData/images/custom_mcq/AMNH.jpg
|
||
3 Cities B Which city's skyline is this? New York Tokyo Shanghai Paris /root/LMUData/images/custom_mcq/tokyo.jpg
|
||
4 Vehicles C What is the brand of this car? BMW Audi Tesla Mercedes /root/LMUData/images/custom_mcq/tesla.jpg
|
||
5 Activities A What is the person in the picture doing? Running Swimming Reading Singing /root/LMUData/images/custom_mcq/running.jpg
|
||
```
|
||
其中:
|
||
- `index`为问题序号
|
||
- `question`为问题
|
||
- `answer`为答案
|
||
- `A`、`B`、`C`、`D`为选项,不得少于两个选项
|
||
- `answer`为答案选项
|
||
- `image_path`为图片路径(建议使用绝对路径);也可替换为`image`字段,需为base64编码的图片
|
||
- `category`为类别(可选字段)
|
||
|
||
将该文件放在`~/LMUData`路径中,即可使用文件名来进行评测。例如该文件名为`custom_mcq.tsv`,则使用`custom_mcq`即可评测。
|
||
|
||
### 2. 配置文件
|
||
配置文件,可以为`python dict`、`yaml`或`json`格式,例如如下`config.yaml`文件:
|
||
```yaml
|
||
eval_backend: VLMEvalKit
|
||
eval_config:
|
||
model:
|
||
- type: qwen-vl-chat # 部署的模型名称
|
||
name: CustomAPIModel # 固定值
|
||
api_base: http://localhost:8000/v1/chat/completions
|
||
key: EMPTY
|
||
temperature: 0.0
|
||
img_size: -1
|
||
data:
|
||
- custom_mcq # 自定义数据集名称,放在`~/LMUData`路径中
|
||
mode: all
|
||
limit: 10
|
||
reuse: false
|
||
work_dir: outputs
|
||
nproc: 1
|
||
```
|
||
```{seealso}
|
||
VLMEvalKit[参数说明](../../user_guides/backend/vlmevalkit_backend.md#参数说明)
|
||
```
|
||
### 3. 运行评测
|
||
|
||
运行下面的代码,即可开始评测:
|
||
```python
|
||
from evalscope.run import run_task
|
||
|
||
run_task(task_cfg='config.yaml')
|
||
```
|
||
|
||
评测结果如下:
|
||
```text
|
||
---------- ----
|
||
split none
|
||
Overall 1.0
|
||
Activities 1.0
|
||
Animals 1.0
|
||
Buildings 1.0
|
||
Cities 1.0
|
||
Vehicles 1.0
|
||
---------- ----
|
||
```
|
||
|
||
## 自定义问答题格式(VQA)
|
||
|
||
### 1. 数据准备
|
||
|
||
准备一个问答题格式的tsv文件,格式如下:
|
||
```text
|
||
index answer question image_path
|
||
1 Dog What animal is this? /root/LMUData/images/custom_mcq/dog.jpg
|
||
2 Museum What building is this? /root/LMUData/images/custom_mcq/AMNH.jpg
|
||
3 Tokyo Which city's skyline is this? /root/LMUData/images/custom_mcq/tokyo.jpg
|
||
4 Tesla What is the brand of this car? /root/LMUData/images/custom_mcq/tesla.jpg
|
||
5 Running What is the person in the picture doing? /root/LMUData/images/custom_mcq/running.jpg
|
||
```
|
||
该文件与选择题格式相同,其中:
|
||
- `index`为问题序号
|
||
- `question`为问题
|
||
- `answer`为答案
|
||
- `image_path`为图片路径(建议使用绝对路径);也可替换为`image`字段,需为base64编码的图片
|
||
|
||
将该文件放在`~/LMUData`路径中,即可使用文件名来进行评测。例如该文件名为`custom_vqa.tsv`,则使用`custom_vqa`即可评测。
|
||
|
||
### 2. 自定义评测脚本
|
||
|
||
以下是一个自定义数据集的示例,该示例实现了一个自定义的问答题格式的评测脚本,该脚本会自动加载数据集,并使用默认的提示进行问答,最后计算准确率作为评测指标。
|
||
|
||
|
||
```python
|
||
import os
|
||
import numpy as np
|
||
from vlmeval.dataset.image_base import ImageBaseDataset
|
||
from vlmeval.dataset.image_vqa import CustomVQADataset
|
||
from vlmeval.smp import load, dump, d2df
|
||
|
||
class CustomDataset:
|
||
def load_data(self, dataset):
|
||
# 自定义数据集的加载
|
||
data_path = os.path.join(os.path.expanduser("~/LMUData"), f'{dataset}.tsv')
|
||
return load(data_path)
|
||
|
||
def build_prompt(self, line):
|
||
msgs = ImageBaseDataset.build_prompt(self, line)
|
||
# 这里添加提示或自定义指令
|
||
msgs[-1]['value'] += '\n用一个单词或短语回答问题。'
|
||
return msgs
|
||
|
||
def evaluate(self, eval_file, **judge_kwargs):
|
||
data = load(eval_file)
|
||
assert 'answer' in data and 'prediction' in data
|
||
data['prediction'] = [str(x) for x in data['prediction']]
|
||
data['answer'] = [str(x) for x in data['answer']]
|
||
|
||
print(data)
|
||
|
||
# ========根据需要计算评测指标=========
|
||
# 精确匹配
|
||
result = np.mean(data['answer'] == data['prediction'])
|
||
ret = {'Overall': result}
|
||
ret = d2df(ret).round(2)
|
||
# 保存结果
|
||
suffix = eval_file.split('.')[-1]
|
||
result_file = eval_file.replace(f'.{suffix}', '_acc.csv')
|
||
dump(ret, result_file)
|
||
return ret
|
||
# ====================================
|
||
|
||
# 需保留以下代码,重写默认的数据集类
|
||
CustomVQADataset.load_data = CustomDataset.load_data
|
||
CustomVQADataset.build_prompt = CustomDataset.build_prompt
|
||
CustomVQADataset.evaluate = CustomDataset.evaluate
|
||
```
|
||
|
||
### 3. 配置文件
|
||
配置文件,可以为`python dict`、`yaml`或`json`格式,例如如下`config.yaml`文件:
|
||
```{code-block} yaml
|
||
:caption: config.yaml
|
||
|
||
eval_backend: VLMEvalKit
|
||
eval_config:
|
||
model:
|
||
- type: qwen-vl-chat
|
||
name: CustomAPIModel
|
||
api_base: http://localhost:8000/v1/chat/completions
|
||
key: EMPTY
|
||
temperature: 0.0
|
||
img_size: -1
|
||
data:
|
||
- custom_vqa # 自定义数据集名称,放在`~/LMUData`路径中
|
||
mode: all
|
||
limit: 10
|
||
reuse: false
|
||
work_dir: outputs
|
||
nproc: 1
|
||
```
|
||
|
||
### 4. 运行评测
|
||
|
||
完整评测脚本如下:
|
||
```{code-block} python
|
||
:emphasize-lines: 1
|
||
|
||
from custom_dataset import CustomDataset # 导入自定义数据集
|
||
from evalscope.run import run_task
|
||
|
||
run_task(task_cfg='config.yaml')
|
||
```
|
||
|
||
评测结果如下:
|
||
```text
|
||
{'qwen-vl-chat_custom_vqa_acc': {'Overall': '1.0'}}
|
||
```
|