vllm/vllm_v0.10.0/docs/features/quantization
hailin 38d813617c first commit 2025-08-03 20:28:19 +08:00
..
README.md first commit 2025-08-03 20:28:19 +08:00
auto_awq.md first commit 2025-08-03 20:28:19 +08:00
bitblas.md first commit 2025-08-03 20:28:19 +08:00
bnb.md first commit 2025-08-03 20:28:19 +08:00
fp8.md first commit 2025-08-03 20:28:19 +08:00
gguf.md first commit 2025-08-03 20:28:19 +08:00
gptqmodel.md first commit 2025-08-03 20:28:19 +08:00
inc.md first commit 2025-08-03 20:28:19 +08:00
int4.md first commit 2025-08-03 20:28:19 +08:00
int8.md first commit 2025-08-03 20:28:19 +08:00
modelopt.md first commit 2025-08-03 20:28:19 +08:00
quantized_kvcache.md first commit 2025-08-03 20:28:19 +08:00
quark.md first commit 2025-08-03 20:28:19 +08:00
supported_hardware.md first commit 2025-08-03 20:28:19 +08:00
torchao.md first commit 2025-08-03 20:28:19 +08:00

README.md

Quantization

Quantization trades off model precision for smaller memory footprint, allowing large models to be run on a wider range of devices.

Contents: