sglang0.4.5.post1/python/sglang/srt/layers/quantization/compressed_tensors
hailin 0558580343 first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
..
schemes first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
README.md first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
__init__.py first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
compressed_tensors.py first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
compressed_tensors_moe.py first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00
utils.py first commit @ sglang v0.4.5.post1 2025-06-29 18:55:37 +08:00

README.md

quantization compressed_tensors module

To support compressed_tensors format quantization models, we adapted https://github.com/vllm-project/vllm/tree/main/vllm/model_executor/layers/quantization/compressed_tensors into SGLang.

For practical purposes, we have only applied the compressed_tensors format of w8a8_fp8. If you have requirements for other formats, you can submit an issue through this link.