11 KiB
11 KiB
(clip_benchmark)=
CLIP Benchmark
This framework supports the CLIP Benchmark, which aims to provide a unified framework and benchmark for evaluating and analyzing CLIP (Contrastive Language-Image Pretraining) and its variants. Currently, the framework supports 43 evaluation datasets, including zero-shot retrieval tasks with the evaluation metric of recall@k, and zero-shot classification tasks with the evaluation metric of acc@k.
Supported Datasets
| Dataset Name | Task Type | Notes |
|---|---|---|
| muge | zeroshot_retrieval | Chinese Multimodal Dataset |
| flickr30k | zeroshot_retrieval | |
| flickr8k | zeroshot_retrieval | |
| mscoco_captions | zeroshot_retrieval | |
| mscoco_captions2017 | zeroshot_retrieval | |
| imagenet1k | zeroshot_classification | |
| imagenetv2 | zeroshot_classification | |
| imagenet_sketch | zeroshot_classification | |
| imagenet-a | zeroshot_classification | |
| imagenet-r | zeroshot_classification | |
| imagenet-o | zeroshot_classification | |
| objectnet | zeroshot_classification | |
| fer2013 | zeroshot_classification | |
| voc2007 | zeroshot_classification | |
| voc2007_multilabel | zeroshot_classification | |
| sun397 | zeroshot_classification | |
| cars | zeroshot_classification | |
| fgvc_aircraft | zeroshot_classification | |
| mnist | zeroshot_classification | |
| stl10 | zeroshot_classification | |
| gtsrb | zeroshot_classification | |
| country211 | zeroshot_classification | |
| renderedsst2 | zeroshot_classification | |
| vtab_caltech101 | zeroshot_classification | |
| vtab_cifar10 | zeroshot_classification | |
| vtab_cifar100 | zeroshot_classification | |
| vtab_clevr_count_all | zeroshot_classification | |
| vtab_clevr_closest_object_distance | zeroshot_classification | |
| vtab_diabetic_retinopathy | zeroshot_classification | |
| vtab_dmlab | zeroshot_classification | |
| vtab_dsprites_label_orientation | zeroshot_classification | |
| vtab_dsprites_label_x_position | zeroshot_classification | |
| vtab_dsprites_label_y_position | zeroshot_classification | |
| vtab_dtd | zeroshot_classification | |
| vtab_eurosat | zeroshot_classification | |
| vtab_kitti_closest_vehicle_distance | zeroshot_classification | |
| vtab_flowers | zeroshot_classification | |
| vtab_pets | zeroshot_classification | |
| vtab_pcam | zeroshot_classification | |
| vtab_resisc45 | zeroshot_classification | |
| vtab_smallnorb_label_azimuth | zeroshot_classification | |
| vtab_smallnorb_label_elevation | zeroshot_classification | |
| vtab_svhn | zeroshot_classification |
Environment Preparation
Install the required packages
pip install evalscope[rag] -U
Configure Evaluation Parameters
task_cfg = {
"work_dir": "outputs",
"eval_backend": "RAGEval",
"eval_config": {
"tool": "clip_benchmark",
"eval": {
"models": [
{
"model_name": "AI-ModelScope/chinese-clip-vit-large-patch14-336px",
}
],
"dataset_name": ["muge", "flickr8k"],
"split": "test",
"batch_size": 128,
"num_workers": 1,
"verbose": True,
"skip_existing": False,
"cache_dir": "cache",
"limit": 1000,
},
},
}
Parameter Description
eval_backend: Default value isRAGEval, indicating the use of the RAGEval evaluation backend.eval_config: A dictionary containing the following fields:tool: The evaluation tool, usingclip_benchmark.eval: A dictionary containing the following fields:models: A list of model configurations, each with the following fields:model_name:strThe model name or path, e.g.,AI-ModelScope/chinese-clip-vit-large-patch14-336px. Supports automatic downloading from the ModelScope repository.
dataset_name:List[str]A list of dataset names, e.g.,["muge", "flickr8k", "mnist"]. See Task List.split:strThe split of the dataset to use, default istest.batch_size:intBatch size for data loading, default is128.num_workers:intNumber of worker threads for data loading, default is1.verbose:boolWhether to enable detailed logging, default isTrue.skip_existing:boolWhether to skip processing if output already exists, default isFalse.cache_dir:strDataset cache directory, default iscache.limit:Optional[int]Limit the number of samples to process, default isNone, e.g.,1000.
Run Evaluation Task
from evalscope.run import run_task
from evalscope.utils.logger import get_logger
logger = get_logger()
# Run task
run_task(task_cfg=task_cfg)
Output Evaluation Results
:caption: outputs/chinese-clip-vit-large-patch14-336px/muge_zeroshot_retrieval.json
{"dataset": "muge", "model": "AI-ModelScope/chinese-clip-vit-large-patch14-336px", "task": "zeroshot_retrieval", "metrics": {"image_retrieval_recall@5": 0.8935546875, "text_retrieval_recall@5": 0.876953125}}
Custom Evaluation Dataset
[Custom Image-Text Dataset](../../../advanced_guides/custom_dataset/clip.md)