|
|
||
|---|---|---|
| .. | ||
| schemes | ||
| README.md | ||
| __init__.py | ||
| compressed_tensors.py | ||
| compressed_tensors_moe.py | ||
| utils.py | ||
README.md
quantization compressed_tensors module
To support compressed_tensors format quantization models, we adapted https://github.com/vllm-project/vllm/tree/main/vllm/model_executor/layers/quantization/compressed_tensors into SGLang.
For practical purposes, we have only applied the compressed_tensors format of w8a8_fp8. If you have requirements for other formats, you can submit an issue through this link.