evalscope/docs/en/experiments/speed_benchmark/QwQ-32B-Preview.md

84 lines
2.2 KiB
Markdown

# QwQ-32B-Preview
> QwQ-32B-Preview is an experimental research model developed by the Qwen team, aimed at enhancing the reasoning capabilities of artificial intelligence. [Model Link](https://modelscope.cn/models/Qwen/QwQ-32B-Preview/summary)
The Speed Benchmark tool was used to test the GPU memory usage and inference speed of the QwQ-32B-Preview model under different configurations. The following tests measure the speed and memory usage when generating 2048 tokens, with input lengths of 1, 6144, 14336, and 30720:
## Local Transformers Inference Speed
### Test Environment
- NVIDIA A100 80GB * 1
- CUDA 12.1
- Pytorch 2.3.1
- Flash Attention 2.5.8
- Transformers 4.46.0
- EvalScope 0.7.0
### Stress Testing Command
```shell
pip install evalscope[perf] -U
```
```shell
CUDA_VISIBLE_DEVICES=0 evalscope perf \
--parallel 1 \
--model Qwen/QwQ-32B-Preview \
--attn-implementation flash_attention_2 \
--log-every-n-query 1 \
--connect-timeout 60000 \
--read-timeout 60000\
--max-tokens 2048 \
--min-tokens 2048 \
--api local \
--dataset speed_benchmark
```
### Test Results
```text
+---------------+-----------------+----------------+
| Prompt Tokens | Speed(tokens/s) | GPU Memory(GB) |
+---------------+-----------------+----------------+
| 1 | 17.92 | 61.58 |
| 6144 | 12.61 | 63.72 |
| 14336 | 9.01 | 67.31 |
| 30720 | 5.61 | 74.47 |
+---------------+-----------------+----------------+
```
## vLLM Inference Speed
### Test Environment
- NVIDIA A100 80GB * 2
- CUDA 12.1
- vLLM 0.6.3
- Pytorch 2.4.0
- Flash Attention 2.6.3
- Transformers 4.46.0
### Test Command
```shell
CUDA_VISIBLE_DEVICES=0,1 evalscope perf \
--parallel 1 \
--model Qwen/QwQ-32B-Preview \
--log-every-n-query 1 \
--connect-timeout 60000 \
--read-timeout 60000\
--max-tokens 2048 \
--min-tokens 2048 \
--api local_vllm \
--dataset speed_benchmark
```
### Test Results
```text
+---------------+-----------------+
| Prompt Tokens | Speed(tokens/s) |
+---------------+-----------------+
| 1 | 38.17 |
| 6144 | 36.63 |
| 14336 | 35.01 |
| 30720 | 31.68 |
+---------------+-----------------+
```