# 使用示例 ## 使用本地模型推理 本项目支持本地transformers进行推理和vllm推理(需先安装vllm), `--model`可以填入modelscope模型名称,例如`Qwen/Qwen2.5-0.5B-Instruct`;也可以直接指定模型权重路径,例如`/path/to/model_weights`,无需指定`--url`参数。 **1. 使用transformers进行推理** 指定`--api local`: ```bash evalscope perf \ --model 'Qwen/Qwen2.5-0.5B-Instruct' \ --attn-implementation flash_attention_2 \ # 可不填,或选[flash_attention_2|eager|sdpa] --number 20 \ --parallel 2 \ --api local \ --dataset openqa ``` **2. 使用vllm进行推理** 指定`--api local_vllm`: ```bash evalscope perf \ --model 'Qwen/Qwen2.5-0.5B-Instruct' \ --number 20 \ --parallel 2 \ --api local_vllm \ --dataset openqa ``` ## 使用`prompt` ```bash evalscope perf \ --url 'http://127.0.0.1:8000/v1/chat/completions' \ --parallel 2 \ --model 'qwen2.5' \ --log-every-n-query 10 \ --number 20 \ --api openai \ --temperature 0.9 \ --max-tokens 1024 \ --prompt '写一个科幻小说,请开始你的表演' ``` 也可以使用本地文件作为prompt: ```bash evalscope perf \ --url 'http://127.0.0.1:8000/v1/chat/completions' \ --parallel 2 \ --model 'qwen2.5' \ --log-every-n-query 10 \ --number 20 \ --api openai \ --temperature 0.9 \ --max-tokens 1024 \ --prompt @prompt.txt ``` ## 复杂请求 使用`stop`,`stream`,`temperature`等: ```bash evalscope perf \ --url 'http://127.0.0.1:8000/v1/chat/completions' \ --parallel 2 \ --model 'qwen2.5' \ --log-every-n-query 10 \ --read-timeout 120 \ --connect-timeout 120 \ --number 20 \ --max-prompt-length 128000 \ --min-prompt-length 128 \ --api openai \ --temperature 0.7 \ --max-tokens 1024 \ --stop '<|im_end|>' \ --dataset openqa \ --stream ``` ## 使用`query-template` 您可以在`query-template`中设置请求参数: ```bash evalscope perf \ --url 'http://127.0.0.1:8000/v1/chat/completions' \ --parallel 2 \ --model 'qwen2.5' \ --log-every-n-query 10 \ --read-timeout 120 \ --connect-timeout 120 \ --number 20 \ --max-prompt-length 128000 \ --min-prompt-length 128 \ --api openai \ --query-template '{"model": "%m", "messages": [{"role": "user","content": "%p"}], "stream": true, "skip_special_tokens": false, "stop": ["<|im_end|>"], "temperature": 0.7, "max_tokens": 1024}' \ --dataset openqa ``` 其中`%m`和`%p`会被替换为模型名称和prompt。 您也可以使用本地`query-template.json`文件: ```{code-block} json :caption: template.json { "model":"%m", "messages":[ { "role":"user", "content":"%p" } ], "stream":true, "skip_special_tokens":false, "stop":[ "<|im_end|>" ], "temperature":0.7, "max_tokens":1024 } ``` ```bash evalscope perf \ --url 'http://127.0.0.1:8000/v1/chat/completions' \ --parallel 2 \ --model 'qwen2.5' \ --log-every-n-query 10 \ --read-timeout 120 \ --connect-timeout 120 \ --number 20 \ --max-prompt-length 128000 \ --min-prompt-length 128 \ --api openai \ --query-template @template.json \ --dataset openqa ``` ## 使用random数据集 根据`prefix-length`,`max-prompt-length`和`min-prompt-length`随机生成prompt,必需指定`tokenizer-path`。生成prompt的token数量在`prefix_length + min-prompt-length`和`prefix_length + max-prompt-length`之间均匀分布,在一次测试中所有请求prefix部分相同。 ```{note} 由于chat_template以及tokenize算法的影响,生成的prompt的token数量可能有些误差,不是精确的指定token数量。 ``` 执行以下命令即可: ```bash evalscope perf \ --parallel 20 \ --model Qwen2.5-0.5B-Instruct \ --url http://127.0.0.1:8801/v1/chat/completions \ --api openai \ --dataset random \ --min-tokens 128 \ --max-tokens 128 \ --prefix-length 64 \ --min-prompt-length 1024 \ --max-prompt-length 2048 \ --number 100 \ --tokenizer-path Qwen/Qwen2.5-0.5B-Instruct \ --debug ``` ## 使用SwanLab记录测试结果 请使用如下命令安装SwanLab: ```bash pip install swanlab ``` 启动测试前添加如下参数: ```bash --swanlab-api-key 'swanlab_api_key' --name 'name_of_swanlab_log' ``` ![swanlab sample](https://sail-moe.oss-cn-hangzhou.aliyuncs.com/yunlin/images/evalscope/swanlab.png) ## 调试请求 使用 `--debug` 选项,我们将输出请求和响应,输出示例如下: **非`stream`模式输出示例** ```text 2024-11-27 11:25:34,161 - evalscope - http_client.py - on_request_start - 116 - DEBUG - Starting request: )> 2024-11-27 11:25:34,163 - evalscope - http_client.py - on_request_chunk_sent - 128 - DEBUG - Request sent: 2024-11-27 11:25:38,172 - evalscope - http_client.py - on_response_chunk_received - 140 - DEBUG - Request received: ``` **`stream`模式输出示例** ```text 2024-11-27 20:02:24,760 - evalscope - http_client.py - _handle_stream - 57 - DEBUG - Response recevied: data: {"model":"Qwen2.5-0.5B-Instruct","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"重要的"},"finish_reason":null}],"usage":null} 2024-11-27 20:02:24,803 - evalscope - http_client.py - _handle_stream - 57 - DEBUG - Response recevied: data: {"model":"Qwen2.5-0.5B-Instruct","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":""},"finish_reason":null}],"usage":null} 2024-11-27 20:02:24,847 - evalscope - http_client.py - _handle_stream - 57 - DEBUG - Response recevied: data: {"model":"Qwen2.5-0.5B-Instruct","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":",以便"},"finish_reason":null}],"usage":null} 2024-11-27 20:02:24,890 - evalscope - http_client.py - _handle_stream - 57 - DEBUG - Response recevied: data: {"model":"Qwen2.5-0.5B-Instruct","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"及时"},"finish_reason":null}],"usage":null} 2024-11-27 20:02:24,933 - evalscope - http_client.py - _handle_stream - 57 - DEBUG - Response recevied: data: {"model":"Qwen2.5-0.5B-Instruct","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"得到"},"finish_reason":null}],"usage":null} 2024-11-27 20:02:24,976 - evalscope - http_client.py - _handle_stream - 57 - DEBUG - Response recevied: data: {"model":"Qwen2.5-0.5B-Instruct","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"帮助"},"finish_reason":null}],"usage":null} 2024-11-27 20:02:25,023 - evalscope - http_client.py - _handle_stream - 57 - DEBUG - Response recevied: data: {"model":"Qwen2.5-0.5B-Instruct","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"和支持"},"finish_reason":null}],"usage":null} 2024-11-27 20:02:25,066 - evalscope - http_client.py - _handle_stream - 57 - DEBUG - Response recevied: data: {"model":"Qwen2.5-0.5B-Instruct","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":""},"finish_reason":null}],"usage":null} 2024-11-27 20:02:25,109 - evalscope - http_client.py - _handle_stream - 57 - DEBUG - Response recevied: data: {"model":"Qwen2.5-0.5B-Instruct","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":""},"finish_reason":null}],"usage":null} 2024-11-27 20:02:25,111 - evalscope - http_client.py - _handle_stream - 57 - DEBUG - Response recevied: data: {"model":"Qwen2.5-0.5B-Instruct","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"。<|im_end|>"},"finish_reason":null}],"usage":null} 2024-11-27 20:02:25,113 - evalscope - http_client.py - _handle_stream - 57 - DEBUG - Response recevied: data: {"model":"Qwen2.5-0.5B-Instruct","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":50,"completion_tokens":260,"total_tokens":310}} 2024-11-27 20:02:25,113 - evalscope - http_client.py - _handle_stream - 57 - DEBUG - Response recevied: data: [DONE] ```