sglang_v0.5.2/flashinfer_0.3.1/3rdparty/cutlass/examples
hailin 06e45b5ff9 local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
..
00_basic_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
01_cutlass_utilities local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
02_dump_reg_shmem local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
03_visualize_layout local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
04_tile_iterator local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
05_batched_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
06_splitK_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
07_volta_tensorop_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
08_turing_tensorop_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
09_turing_tensorop_conv2dfprop local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
10_planar_complex local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
11_planar_complex_array local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
12_gemm_bias_relu local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
13_two_tensor_op_fusion local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
14_ampere_tf32_tensorop_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
15_ampere_sparse_tensorop_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
16_ampere_tensorop_conv2dfprop local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
17_fprop_per_channel_bias local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
18_ampere_fp64_tensorop_affine2_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
19_tensorop_canonical local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
20_simt_canonical local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
21_quaternion_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
22_quaternion_conv local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
23_ampere_gemm_operand_reduction_fusion local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
24_gemm_grouped local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
25_ampere_fprop_mainloop_fusion local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
26_ampere_wgrad_mainloop_fusion local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
27_ampere_3xtf32_fast_accurate_tensorop_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
28_ampere_3xtf32_fast_accurate_tensorop_fprop local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
29_ampere_3xtf32_fast_accurate_tensorop_complex_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
30_wgrad_split_k local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
31_basic_syrk local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
32_basic_trmm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
33_ampere_3xtf32_tensorop_symm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
34_transposed_conv2d local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
35_gemm_softmax local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
36_gather_scatter_fusion local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
37_gemm_layernorm_gemm_fusion local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
38_syr2k_grouped local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
39_gemm_permute local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
40_cutlass_py local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
41_fused_multi_head_attention local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
42_ampere_tensorop_group_conv local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
43_ell_block_sparse_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
44_multi_gemm_ir_and_codegen local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
45_dual_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
46_depthwise_simt_conv2dfprop local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
47_ampere_gemm_universal_streamk local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
48_hopper_warp_specialized_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
49_hopper_gemm_with_collective_builder local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
50_hopper_gemm_with_epilogue_swizzle local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
51_hopper_gett local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
52_hopper_gather_scatter_fusion local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
53_hopper_gemm_permute local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
54_hopper_fp8_warp_specialized_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
55_hopper_mixed_dtype_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
56_hopper_ptr_array_batched_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
57_hopper_grouped_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
58_ada_fp8_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
59_ampere_gather_scatter_conv local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
60_cutlass_import local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
61_hopper_gemm_with_topk_and_softmax local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
62_hopper_sparse_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
63_hopper_gemm_with_weight_prefetch local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
64_ada_fp8_gemm_grouped local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
65_distributed_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
67_hopper_fp8_warp_specialized_gemm_with_blockwise_scaling local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
68_hopper_fp8_warp_specialized_grouped_gemm_with_blockwise_scaling local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
69_hopper_mixed_dtype_grouped_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
70_blackwell_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
71_blackwell_gemm_with_collective_builder local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
72_blackwell_narrow_precision_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
73_blackwell_gemm_preferred_cluster local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
74_blackwell_gemm_streamk local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
75_blackwell_grouped_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
76_blackwell_conv local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
77_blackwell_fmha local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
78_blackwell_emulated_bf16x9_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
79_blackwell_geforce_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
80_blackwell_geforce_sparse_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
81_blackwell_gemm_blockwise local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
82_blackwell_distributed_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
83_blackwell_sparse_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
84_blackwell_narrow_precision_sparse_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
86_blackwell_mixed_dtype_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
87_blackwell_geforce_gemm_blockwise local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
88_hopper_fmha local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
89_sm103_fp4_ultra_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
90_sm103_fp4_ultra_grouped_gemm local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
91_fp4_gemv local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
common local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
cute local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
python local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
CMakeLists.txt local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00
README.md local source code for flashinfer_0.3.1 && torch vision_0.22.1 2025-09-20 12:33:58 +08:00

README.md

CUTLASS - Programming Examples

[!IMPORTANT]

⚠️ Not for Benchmarking! ⚠️

These examples are designed solely for demonstrating CUTLASS functionality and may NOT optimized for performance benchmarking.

For accurate performance measurements, please use the CUTLASS Profiler instead (recommended) or manually auto-tune the example, if unavailable via the profiler.

CuTe - Programming Examples

Examples that do not rely on CUTLASS and directly showcase the features of CuTe are located in cutlass/examples/cute.

Additionally, CuTe's core layout and layout algebra have their own test cases within cutlass/test/unit/cute/core/ that users might find useful as examples of CuTe.

Python Interface Examples

Examples leveraging CUTLASS's Python interface are located in cutlass/examples/python.

Copyright

Copyright (c) 2017 - 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. SPDX-License-Identifier: BSD-3-Clause

  Redistribution and use in source and binary forms, with or without
  modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this
  list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice,
  this list of conditions and the following disclaimer in the documentation
  and/or other materials provided with the distribution.

  3. Neither the name of the copyright holder nor the names of its
  contributors may be used to endorse or promote products derived from
  this software without specific prior written permission.

  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
  DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
  FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.