942 B
942 B
DeepSeek kernels benchmark
Prerequisites
- You should install DeepGemm from source before run
benchmark_deepgemm_fp8_gemm.pyandbenchmark_deepgemm_fp8_group_gemm.py.
Benchmark
-
benchmark_deepgemm_fp8_gemm.pypython benchmark_deepgemm_fp8_gemm.py --run_correctness --tp_size 1 -
benchmark_deepgemm_fp8_group_gemm.pypython benchmark_deepgemm_fp8_group_gemm.py --run_correctness --tp_size 1 -
You can use the
--run_correctnessparameter to verify all kernels results's correctness.- You can use the
--tp_sizeparameter to benchmark all FP8 w8a8 block-wise matrix multiplications involved in DeepSeek V3/R1 under the current tensor parallelism (TP) setting. This benchmark compares DeepSeek's open-source DeepGemm implementation with SGLang's and VLLM Triton implementation.
- You can use the