9.4 KiB
9.4 KiB
(clip_benchmark)=
CLIP Benchmark
本框架支持CLIP Benchmark,其旨在为评测和分析CLIP(Contrastive Language-Image Pretraining)及其变体提供一个统一的框架和基准,目前框架支持43个评测数据集,包括zero-shot retireval任务,评价指标为recall@k;zero-shot classification任务,评价指标为acc@k。
支持的数据集
| 数据集名称 | 任务类型 | 备注 |
|---|---|---|
| muge | zeroshot_retrieval | 中文多模态图文数据集 |
| flickr30k | zeroshot_retrieval | |
| flickr8k | zeroshot_retrieval | |
| mscoco_captions | zeroshot_retrieval | |
| mscoco_captions2017 | zeroshot_retrieval | |
| imagenet1k | zeroshot_classification | |
| imagenetv2 | zeroshot_classification | |
| imagenet_sketch | zeroshot_classification | |
| imagenet-a | zeroshot_classification | |
| imagenet-r | zeroshot_classification | |
| imagenet-o | zeroshot_classification | |
| objectnet | zeroshot_classification | |
| fer2013 | zeroshot_classification | |
| voc2007 | zeroshot_classification | |
| voc2007_multilabel | zeroshot_classification | |
| sun397 | zeroshot_classification | |
| cars | zeroshot_classification | |
| fgvc_aircraft | zeroshot_classification | |
| mnist | zeroshot_classification | |
| stl10 | zeroshot_classification | |
| gtsrb | zeroshot_classification | |
| country211 | zeroshot_classification | |
| renderedsst2 | zeroshot_classification | |
| vtab_caltech101 | zeroshot_classification | |
| vtab_cifar10 | zeroshot_classification | |
| vtab_cifar100 | zeroshot_classification | |
| vtab_clevr_count_all | zeroshot_classification | |
| vtab_clevr_closest_object_distance | zeroshot_classification | |
| vtab_diabetic_retinopathy | zeroshot_classification | |
| vtab_dmlab | zeroshot_classification | |
| vtab_dsprites_label_orientation | zeroshot_classification | |
| vtab_dsprites_label_x_position | zeroshot_classification | |
| vtab_dsprites_label_y_position | zeroshot_classification | |
| vtab_dtd | zeroshot_classification | |
| vtab_eurosat | zeroshot_classification | |
| vtab_kitti_closest_vehicle_distance | zeroshot_classification | |
| vtab_flowers | zeroshot_classification | |
| vtab_pets | zeroshot_classification | |
| vtab_pcam | zeroshot_classification | |
| vtab_resisc45 | zeroshot_classification | |
| vtab_smallnorb_label_azimuth | zeroshot_classification | |
| vtab_smallnorb_label_elevation | zeroshot_classification | |
| vtab_svhn | zeroshot_classification |
环境准备
安装依赖包
pip install evalscope[rag] -U
配置评测参数
task_cfg = {
"work_dir": "outputs",
"eval_backend": "RAGEval",
"eval_config": {
"tool": "clip_benchmark",
"eval": {
"models": [
{
"model_name": "AI-ModelScope/chinese-clip-vit-large-patch14-336px",
}
],
"dataset_name": ["muge", "flickr8k"],
"split": "test",
"batch_size": 128,
"num_workers": 1,
"verbose": True,
"skip_existing": False,
"cache_dir": "cache",
"limit": 1000,
},
},
}
参数说明:
eval_backend:默认值为RAGEval,表示使用 RAGEval 评测后端。eval_config:字典,包含以下字段:tool:评测工具,使用clip_benchmark。eval:字典,包含以下字段:models:模型配置列表,包含以下字段:model_name:str模型名称或路径,例如AI-ModelScope/chinese-clip-vit-large-patch14-336px,支持从modelscope仓库自动下载模型。
dataset_name:List[str]数据集名称列表,例如["muge", "flickr8k", "mnist"],参见任务列表。split:str数据集的划分部分,默认为test。batch_size:int数据加载的批量大小,默认为128。num_workers:int数据加载的工作线程数,默认为1。verbose:bool是否启用详细日志记录,默认为True。skip_existing:bool如果输出已经存在,是否跳过处理,默认为False。cache_dir:str数据集缓存目录,默认为cache。limit:Optional[int]限制处理样本的数量,默认为None,例如1000。
运行评测任务
from evalscope.run import run_task
run_task(task_cfg=task_cfg)
输出评测结果如下:
:caption: outputs/chinese-clip-vit-large-patch14-336px/muge_zeroshot_retrieval.json
{"dataset": "muge", "model": "AI-ModelScope/chinese-clip-vit-large-patch14-336px", "task": "zeroshot_retrieval", "metrics": {"image_retrieval_recall@5": 0.8935546875, "text_retrieval_recall@5": 0.876953125}}
自定义评测数据集
[自定义图文评测数据集](../../../advanced_guides/custom_dataset/clip.md)