sglang_v0.5.2/flashinfer_0.3.1/docs/api/fused_moe.rst

45 lines
776 B
ReStructuredText

.. _apifused_moe:
flashinfer.fused_moe
====================
.. currentmodule:: flashinfer.fused_moe
This module provides fused Mixture-of-Experts (MoE) operations optimized for different backends and data types.
Types and Enums
---------------
.. autosummary::
:toctree: ../generated
RoutingMethodType
WeightLayout
Utility Functions
-----------------
.. autosummary::
:toctree: ../generated
convert_to_block_layout
reorder_rows_for_gated_act_gemm
CUTLASS Fused MoE
-----------------
.. autosummary::
:toctree: ../generated
cutlass_fused_moe
TensorRT-LLM Fused MoE
----------------------
.. autosummary::
:toctree: ../generated
trtllm_fp4_block_scale_moe
trtllm_fp8_block_scale_moe
trtllm_fp8_per_tensor_scale_moe