sglang.0.4.8.post1/sglang/benchmark/kernels/deepseek
hailin d4330e0746 first commit @ sglang v0.4.8.post1 2025-06-29 14:14:11 +00:00
..
README.md first commit @ sglang v0.4.8.post1 2025-06-29 14:14:11 +00:00
benchmark_deepgemm_fp8_gemm.py first commit @ sglang v0.4.8.post1 2025-06-29 14:14:11 +00:00
benchmark_deepgemm_fp8_group_gemm.py first commit @ sglang v0.4.8.post1 2025-06-29 14:14:11 +00:00

README.md

DeepSeek kernels benchmark

Prerequisites

  • You should install DeepGemm from source before run benchmark_deepgemm_fp8_gemm.py and benchmark_deepgemm_fp8_group_gemm.py.

Benchmark

  • benchmark_deepgemm_fp8_gemm.py

    python benchmark_deepgemm_fp8_gemm.py --run_correctness --tp_size 1
    
  • benchmark_deepgemm_fp8_group_gemm.py

    python benchmark_deepgemm_fp8_group_gemm.py --run_correctness --tp_size 1
    
  • You can use the --run_correctness parameter to verify all kernels results's correctness.

    • You can use the --tp_size parameter to benchmark all FP8 w8a8 block-wise matrix multiplications involved in DeepSeek V3/R1 under the current tensor parallelism (TP) setting. This benchmark compares DeepSeek's open-source DeepGemm implementation with SGLang's and VLLM Triton implementation.