37 lines
710 B
ReStructuredText
37 lines
710 B
ReStructuredText
.. _apifp4_quantization:
|
|
|
|
flashinfer.fp4_quantization
|
|
===========================
|
|
|
|
.. currentmodule:: flashinfer.fp4_quantization
|
|
|
|
This module provides FP4 quantization operations for LLM inference, supporting various scale factor layouts and quantization formats.
|
|
|
|
Core Quantization Functions
|
|
---------------------------
|
|
|
|
.. autosummary::
|
|
:toctree: ../generated
|
|
|
|
fp4_quantize
|
|
nvfp4_quantize
|
|
nvfp4_block_scale_interleave
|
|
e2m1_and_ufp8sf_scale_to_float
|
|
|
|
Matrix Shuffling Utilities
|
|
--------------------------
|
|
|
|
.. autosummary::
|
|
:toctree: ../generated
|
|
|
|
shuffle_matrix_a
|
|
shuffle_matrix_sf_a
|
|
|
|
Types and Enums
|
|
---------------
|
|
|
|
.. autosummary::
|
|
:toctree: ../generated
|
|
|
|
SfLayout
|