sglang_v0.5.2/flashinfer_0.3.1/docs/api/fp4_quantization.rst

37 lines
710 B
ReStructuredText

.. _apifp4_quantization:
flashinfer.fp4_quantization
===========================
.. currentmodule:: flashinfer.fp4_quantization
This module provides FP4 quantization operations for LLM inference, supporting various scale factor layouts and quantization formats.
Core Quantization Functions
---------------------------
.. autosummary::
:toctree: ../generated
fp4_quantize
nvfp4_quantize
nvfp4_block_scale_interleave
e2m1_and_ufp8sf_scale_to_float
Matrix Shuffling Utilities
--------------------------
.. autosummary::
:toctree: ../generated
shuffle_matrix_a
shuffle_matrix_sf_a
Types and Enums
---------------
.. autosummary::
:toctree: ../generated
SfLayout