# Multimodal Large Model This framework supports multiple-choice questions and QA questions, two predefined dataset formats. The usage process is as follows: ````{note} Custom dataset evaluation requires using `VLMEvalKit`, which requires additional dependencies: ```shell pip install evalscope[vlmeval] ``` Reference: [Evaluation Backend with VLMEvalKit](../../user_guides/backend/vlmevalkit_backend.md) ```` ## Multiple-Choice Question Format (MCQ) ### 1. Data Preparation The evaluation metric is accuracy, and you need to define a tsv file in the following format (using `\t` as the separator): ```text index category answer question A B C D image_path 1 Animals A What animal is this? Dog Cat Tiger Elephant /root/LMUData/images/custom_mcq/dog.jpg 2 Buildings D What building is this? School Hospital Park Museum /root/LMUData/images/custom_mcq/AMNH.jpg 3 Cities B Which city's skyline is this? New York Tokyo Shanghai Paris /root/LMUData/images/custom_mcq/tokyo.jpg 4 Vehicles C What is the brand of this car? BMW Audi Tesla Mercedes /root/LMUData/images/custom_mcq/tesla.jpg 5 Activities A What is the person in the picture doing? Running Swimming Reading Singing /root/LMUData/images/custom_mcq/running.jpg ``` Where: - `index` is the question number - `question` is the question - `answer` is the answer - `A`, `B`, `C`, `D` are the options, with at least two options - `answer` is the answer option - `image_path` is the image path (absolute paths are recommended); this can also be replaced with the `image` field, which should be base64 encoded - `category` is the category (optional field) Place this file in the `~/LMUData` path, and you can use the filename for evaluation. For example, if the filename is `custom_mcq.tsv`, you can use `custom_mcq` for evaluation. ### 2. Configuration Task The configuration file can be in `python dict`, `yaml`, or `json` format, for example, the following `config.yaml` file: ```yaml eval_backend: VLMEvalKit eval_config: model: - type: qwen-vl-chat # Name of the deployed model name: CustomAPIModel # Fixed value api_base: http://localhost:8000/v1/chat/completions key: EMPTY temperature: 0.0 img_size: -1 data: - custom_mcq # Name of the custom dataset, placed in `~/LMUData` mode: all limit: 10 reuse: false work_dir: outputs nproc: 1 ``` ```{seealso} VLMEvalKit [Parameter Description](../../user_guides/backend/vlmevalkit_backend.md#parameter-explanation) ``` ### 3. Running Evaluation Run the following code to start the evaluation: ```python from evalscope.run import run_task run_task(task_cfg='config.yaml') ``` The evaluation results are as follows: ```text ---------- ---- split none Overall 1.0 Activities 1.0 Animals 1.0 Buildings 1.0 Cities 1.0 Vehicles 1.0 ---------- ---- ``` ## Custom QA Question Format (VQA) ### 1. Data Preparation Prepare a QA formatted tsv file as follows: ```text index answer question image_path 1 Dog What animal is this? /root/LMUData/images/custom_mcq/dog.jpg 2 Museum What building is this? /root/LMUData/images/custom_mcq/AMNH.jpg 3 Tokyo Which city's skyline is this? /root/LMUData/images/custom_mcq/tokyo.jpg 4 Tesla What is the brand of this car? /root/LMUData/images/custom_mcq/tesla.jpg 5 Running What is the person in the picture doing? /root/LMUData/images/custom_mcq/running.jpg ``` This file is similar to the MCQ format, where: - `index` is the question number - `question` is the question - `answer` is the answer - `image_path` is the image path (absolute paths are recommended); this can also be replaced with the `image` field, which should be base64 encoded Place this file in the `~/LMUData` path, and you can use the filename for evaluation. For example, if the filename is `custom_vqa.tsv`, you can use `custom_vqa` for evaluation. ### 2. Custom Evaluation Script Below is an example of a custom dataset, implementing a custom QA format evaluation script. This script will automatically load the dataset, use default prompts for QA, and finally compute accuracy as the evaluation metric. ```python import os import numpy as np from vlmeval.dataset.image_base import ImageBaseDataset from vlmeval.dataset.image_vqa import CustomVQADataset from vlmeval.smp import load, dump, d2df class CustomDataset: def load_data(self, dataset): # Load custom dataset data_path = os.path.join(os.path.expanduser("~/LMUData"), f'{dataset}.tsv') return load(data_path) def build_prompt(self, line): msgs = ImageBaseDataset.build_prompt(self, line) # Add prompts or custom instructions here msgs[-1]['value'] += '\nAnswer the question in one word or phrase.' return msgs def evaluate(self, eval_file, **judge_kwargs): data = load(eval_file) assert 'answer' in data and 'prediction' in data data['prediction'] = [str(x) for x in data['prediction']] data['answer'] = [str(x) for x in data['answer']] print(data) # ========Compute the evaluation metric as needed========= # Exact match result = np.mean(data['answer'] == data['prediction']) ret = {'Overall': result} ret = d2df(ret).round(2) # Save the result suffix = eval_file.split('.')[-1] result_file = eval_file.replace(f'.{suffix}', '_acc.csv') dump(ret, result_file) return ret # ======================================================== # Keep the following code and override the default dataset class CustomVQADataset.load_data = CustomDataset.load_data CustomVQADataset.build_prompt = CustomDataset.build_prompt CustomVQADataset.evaluate = CustomDataset.evaluate ``` ### 3. Configuration File The configuration file can be in `python dict`, `yaml`, or `json` format. For example, the following `config.yaml` file: ```{code-block} yaml :caption: config.yaml eval_backend: VLMEvalKit eval_config: model: - type: qwen-vl-chat name: CustomAPIModel api_base: http://localhost:8000/v1/chat/completions key: EMPTY temperature: 0.0 img_size: -1 data: - custom_vqa # Name of the custom dataset, placed in `~/LMUData` mode: all limit: 10 reuse: false work_dir: outputs nproc: 1 ``` ### 4. Running Evaluation The complete evaluation script is as follows: ```{code-block} python :emphasize-lines: 1 from custom_dataset import CustomDataset # Import the custom dataset from evalscope.run import run_task run_task(task_cfg='config.yaml') ``` The evaluation results are as follows: ```text {'qwen-vl-chat_custom_vqa_acc': {'Overall': '1.0'}} ```