sglang0.4.5.post1/python/sglang/srt/layers/moe/fused_moe_triton/configs
hailin 0558580343 first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
..
E=1,N=1792,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=1,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=1,N=3072,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=1,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=1,N=3072,device_name=NVIDIA_H100_80GB_HBM3.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=1,N=3584,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=1,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=1,N=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=1,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=1,N=14336,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=1,N=14336,device_name=NVIDIA_A100-SXM4-80GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=1792,device_name=AMD_Instinct_MI300X.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=1792,device_name=AMD_Instinct_MI325X.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=1792,device_name=AMD_Radeon_Graphics.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=1792,device_name=NVIDIA_A100-SXM4-40GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=1792,device_name=NVIDIA_H100_80GB_HBM3.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=1792,device_name=NVIDIA_H200,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=1792,device_name=NVIDIA_H200.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=2048,device_name=NVIDIA_A100-SXM4-80GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=2048,device_name=NVIDIA_H200.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=3584,device_name=AMD_Instinct_MI300X.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=3584,device_name=AMD_Instinct_MI325X.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=3584,device_name=AMD_Radeon_Graphics.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=3584,device_name=NVIDIA_A100-SXM4-40GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=3584,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=3584,device_name=NVIDIA_H200,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=3584,device_name=NVIDIA_H200.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=3584,device_name=NVIDIA_L40S.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=4096,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=4096,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=4096,device_name=AMD_Radeon_Graphics,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=4096,device_name=NVIDIA_A100-SXM4-80GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=4096,device_name=NVIDIA_H200,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=4096,device_name=NVIDIA_H200.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=7168,device_name=AMD_Instinct_MI300X.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=7168,device_name=AMD_Instinct_MI325X.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=7168,device_name=AMD_Radeon_Graphics.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=7168,device_name=NVIDIA_H200.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=8192,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=8192,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=8192,device_name=AMD_Radeon_Graphics,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=8192,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=8192,device_name=NVIDIA_H200,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=14336,device_name=AMD_Instinct_MI300X.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=14336,device_name=AMD_Instinct_MI325X.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=14336,device_name=AMD_Radeon_Graphics.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=14336,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=14336,device_name=NVIDIA_H200,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=8,N=14336,device_name=NVIDIA_H200.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=16,N=800,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=16,N=1344,device_name=NVIDIA_A100-SXM4-40GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=16,N=1344,device_name=NVIDIA_A100-SXM4-80GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=16,N=1344,device_name=NVIDIA_H100_80GB_HBM3.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=16,N=1792,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=16,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=16,N=2688,device_name=NVIDIA_A100-SXM4-80GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=16,N=2688,device_name=NVIDIA_H100_80GB_HBM3.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=16,N=3072,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=16,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=16,N=3200,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=16,N=3584,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=16,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=16,N=6400,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=16,N=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=16,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=16,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=16,N=14336,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=16,N=14336,device_name=NVIDIA_A100-SXM4-80GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=64,N=320,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=64,N=320,device_name=NVIDIA_H100_80GB_HBM3.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=64,N=320,device_name=NVIDIA_H200,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=64,N=320,device_name=NVIDIA_H200.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=64,N=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128, 128].json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=64,N=640,device_name=NVIDIA_A100-SXM4-80GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=64,N=640,device_name=NVIDIA_A800-SXM4-80GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=64,N=640,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=64,N=640,device_name=NVIDIA_H200,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=64,N=640,device_name=NVIDIA_H200.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=64,N=1024,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128, 128].json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=64,N=1280,device_name=NVIDIA_A100-SXM4-80GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=64,N=1280,device_name=NVIDIA_A800-SXM4-80GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=64,N=1280,device_name=NVIDIA_H200,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=64,N=1280,device_name=NVIDIA_H200.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=64,N=2560,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=64,N=2560,device_name=NVIDIA_H200,dtype=fp8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=64,N=2560,device_name=NVIDIA_H200.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=160,N=192,device_name=NVIDIA_A800-SXM4-80GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=256,N=64,device_name=NVIDIA_A800-SXM4-80GB.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=256,N=64,device_name=NVIDIA_L20,dtype=int8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=256,N=64,device_name=NVIDIA_L40S,dtype=int8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=256,N=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128, 128].json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=256,N=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=256,N=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128, 128].json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=256,N=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8.json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=256,N=128,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128, 128].json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=256,N=128,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128, 128].json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=256,N=256,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128, 128].json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=256,N=256,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128, 128].json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=256,N=256,device_name=AMD_Radeon_Graphics,dtype=fp8_w8a8,block_shape=[128, 128].json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=256,N=256,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128, 128].json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=256,N=256,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128, 128].json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
E=256,N=256,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128, 128].json first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
README first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00

README

This directory contains tuned configurations for different settings of the fused_moe kernel.
For different settings of
- E (number of experts)
- N (intermediate size)
- device_name (torch.cuda.get_device_name())
the JSON file contains a mapping from M (batch size) to the chosen configuration.

The example configurations provided are for the Mixtral model for TP2 on H100
and TP4 on A100. Mixtral has intermediate size N = 14336, i.e. for TP2 we have
N = 7168 and for TP4 we have N = 3584.

See `benchmark/kernels/fused_moe_triton/README.md` on how to generate these config files.