7.0 KiB
7.0 KiB
Parameter Description
Execute evalscope perf --help to get a full parameter description:
Basic Settings
--model: Name of the test model.--urlspecifies the API address, supporting two types of endpoints:/chat/completionand/completion.--name: Name for the wandb/swanlab database result and result database, default is{model_name}_{current_time}, optional.--api: Specify the service API, currently supports [openai|dashscope|local|local_vllm].- Select
openaito use the API supporting OpenAI, requiring the--urlparameter. - Select
dashscopeto use the API supporting DashScope, requiring the--urlparameter. - Select
localto use local files as models and perform inference using transformers.--modelshould be the model file path or model_id, which will be automatically downloaded from modelscope, e.g.,Qwen/Qwen2.5-0.5B-Instruct. - Select
local_vllmto use local files as models and start the vllm inference service.--modelshould be the model file path or model_id, which will be automatically downloaded from modelscope, e.g.,Qwen/Qwen2.5-0.5B-Instruct. - You can also use a custom API, refer to Custom API Guide.
- Select
--port: The port for the local inference service, defaulting to 8877. This is only applicable tolocalandlocal_vllm.--attn-implementation: Attention implementation method, default is None, optional [flash_attention_2|eager|sdpa], only effective whenapiislocal.--api-key: API key, optional.--debug: Output debug information.
Network Configuration
--connect-timeout: Network connection timeout, default is 600 seconds.--read-timeout: Network read timeout, default is 600 seconds.--headers: Additional HTTP headers, formatted askey1=value1 key2=value2. This header will be used for each query.--no-test-connection: Do not send a connection test, start the stress test directly, default is False.
Request Control
--parallelspecifies the number of concurrent requests, and you can input multiple values separated by spaces; the default is 1.--numberindicates the total number of requests to be sent, and you can input multiple values separated by spaces (must correspond one-to-one withparallel); the default is 1000.--ratedefines the number of requests generated per second (without sending them), with a default of -1, meaning all requests are generated at time 0 with no interval; otherwise, a Poisson process is used to generate request intervals.In the implementation of this tool, request generation and sending are separate: The `--rate` parameter controls the number of requests generated per second, which are placed in a request queue. The `--parallel` parameter controls the number of workers sending requests, with each worker retrieving requests from the queue and sending them, only proceeding to the next request after receiving a response to the previous one.--log-every-n-query: Log every n queries, default is 10.--streamuses SSE (Server-Sent Events) stream output, default is True. Note: Setting--streamis necessary to measure the Time to First Token (TTFT) metric; setting--no-streamwill disable streaming output.
Prompt Settings
--max-prompt-length: The maximum input prompt length, default is131072. Prompts exceeding this length will be discarded.--min-prompt-length: The minimum input prompt length, default is 0. Prompts shorter than this will be discarded.--prefix-length: The length of the prompt prefix, default is 0. This is only effective for therandomdataset.--prompt: Specifies the request prompt, which can be a string or a local file. This has higher priority thandataset. When using a local file, specify the file path with@/path/to/file, e.g.,@./prompt.txt.--query-template: Specifies the query template, which can be aJSONstring or a local file. When using a local file, specify the file path with@/path/to/file, e.g.,@./query_template.json.--apply-chat-templatedetermines whether to apply the chat template, default is None. It will automatically choose based on whether the URL suffix ischat/completion.
Dataset Configuration
Here's the English translation:
--datasetcan specify the following dataset modes. You can also use a custom Python dataset parser, refer to the Custom Dataset Guide.openqauses thequestionfield of a jsonl file as the prompt. Ifdataset_pathis not specified, it will automatically download the dataset from ModelScope. The prompt length is relatively short, generally under 100 tokens.longalpacauses theinstructionfield of a jsonl file as the prompt. Ifdataset_pathis not specified, it will automatically download the dataset from ModelScope. The prompt length is relatively long, generally over 6000 tokens.flickr8kwill construct image-text input, suitable for evaluating multimodal models; it automatically downloads the dataset from ModelScope and does not support specifyingdataset_path.line_by_linerequires providingdataset_path, and uses each line of the txt file as a prompt.randomgenerates prompts randomly based onprefix-length,max-prompt-length, andmin-prompt-length. It requires specifyingtokenizer-path. Usage example.
--dataset-pathis the path to the dataset file, used in conjunction with the dataset.
Model Settings
--tokenizer-path: Optional. Specifies the tokenizer weights path, used to calculate the number of tokens in the input and output, usually located in the same directory as the model weights.--frequency-penalty: The frequency_penalty value.--logprobs: Logarithmic probabilities.--max-tokens: The maximum number of tokens that can be generated.--min-tokens: The minimum number of tokens to generate. Not all model services support this parameter; please check the corresponding API documentation. ForvLLM>=0.8.1versions, you need to additionally set--extra-args '{"ignore_eos": true}'.--n-choices: The number of completion choices to generate.--seed: The random seed, default is 0.--stop: Tokens that stop the generation.--stop-token-ids: Sets the IDs of tokens that stop the generation.--temperature: Sampling temperature, default is 0.0--top-p: Top-p sampling.--top-k: Top-k sampling.--extra-args: Additional parameters to be passed in the request body, formatted as a JSON string. For example:'{"ignore_eos": true}'.
Data Storage
--wandb-api-key: wandb API key, if set, metrics will be saved to wandb.--swanlab-api-key: swanlab API key, if set, metrics will be saved to swanlab.--outputs-dirspecifies the output file path, with a default value of./outputs.