sglang.0.4.8.post1/sglang/docs/references/environment_variables.md

5.4 KiB

Environment Variables

SGLang supports various environment variables that can be used to configure its runtime behavior. This document provides a comprehensive list and aims to stay updated over time.

Note: SGLang uses two prefixes for environment variables: SGL_ and SGLANG_. This is likely due to historical reasons. While both are currently supported for different settings, future versions might consolidate them.

General Configuration

Environment Variable Description Default Value
SGLANG_USE_MODELSCOPE Enable using models from ModelScope false
SGLANG_HOST_IP Host IP address for the server 0.0.0.0
SGLANG_PORT Port for the server auto-detected
SGLANG_LOGGING_CONFIG_PATH Custom logging configuration path Not set
SGLANG_DISABLE_REQUEST_LOGGING Disable request logging false
SGLANG_HEALTH_CHECK_TIMEOUT Timeout for health check in seconds 20

Performance Tuning

Environment Variable Description Default Value
SGLANG_ENABLE_TORCH_INFERENCE_MODE Control whether to use torch.inference_mode false
SGLANG_ENABLE_TORCH_COMPILE Enable torch.compile true
SGLANG_SET_CPU_AFFINITY Enable CPU affinity setting (often set to 1 in Docker builds) 0
SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN Allows the scheduler to overwrite longer context length requests (often set to 1 in Docker builds) 0
SGLANG_IS_FLASHINFER_AVAILABLE Control FlashInfer availability check true
SGLANG_SKIP_P2P_CHECK Skip P2P (peer-to-peer) access check false
SGL_CHUNKED_PREFIX_CACHE_THRESHOLD Sets the threshold for enabling chunked prefix caching 8192
SGLANG_FUSED_MLA_ENABLE_ROPE_FUSION Enable RoPE fusion in Fused Multi-Layer Attention 1

DeepGEMM Configuration (Advanced Optimization)

Environment Variable Description Default Value
SGL_ENABLE_JIT_DEEPGEMM Enable Just-In-Time compilation of DeepGEMM kernels "true"
SGL_JIT_DEEPGEMM_PRECOMPILE Enable precompilation of DeepGEMM kernels "true"
SGL_JIT_DEEPGEMM_COMPILE_WORKERS Number of workers for parallel DeepGEMM kernel compilation 4
SGL_IN_DEEPGEMM_PRECOMPILE_STAGE Indicator flag used during the DeepGEMM precompile script "false"
SGL_DG_CACHE_DIR Directory for caching compiled DeepGEMM kernels ~/.cache/deep_gemm
SGL_DG_USE_NVRTC Use NVRTC (instead of Triton) for JIT compilation (Experimental) "0"
SGL_USE_DEEPGEMM_BMM Use DeepGEMM for Batched Matrix Multiplication (BMM) operations "false"

Memory Management

Environment Variable Description Default Value
SGLANG_DEBUG_MEMORY_POOL Enable memory pool debugging false
SGLANG_CLIP_MAX_NEW_TOKENS_ESTIMATION Clip max new tokens estimation for memory planning Not set
SGLANG_DETOKENIZER_MAX_STATES Maximum states for detokenizer Default value based on system
SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK Disable checks for memory imbalance across Tensor Parallel ranks Not set (defaults to enabled check)

Model-Specific Options

Environment Variable Description Default Value
SGLANG_USE_AITER Use AITER optimize implementation false
SGLANG_INT4_WEIGHT Enable INT4 weight quantization false
SGLANG_MOE_PADDING Enable MoE padding (sets padding size to 128 if value is 1, often set to 1 in Docker builds) 0
SGLANG_FORCE_FP8_MARLIN Force using FP8 MARLIN kernels even if other FP8 kernels are available false
SGLANG_ENABLE_FLASHINFER_GEMM Use flashinfer kernels when running blockwise fp8 GEMM on Blackwell GPUs false
SGLANG_SUPPORT_CUTLASS_BLOCK_FP8 Use Cutlass kernels when running blockwise fp8 GEMM on Hopper or Blackwell GPUs false
SGLANG_CUTLASS_MOE Use Cutlass FP8 MoE kernel on Blackwell GPUs false

Distributed Computing

Environment Variable Description Default Value
SGLANG_BLOCK_NONZERO_RANK_CHILDREN Control blocking of non-zero rank children processes 1
SGL_IS_FIRST_RANK_ON_NODE Indicates if the current process is the first rank on its node "true"
SGLANG_PP_LAYER_PARTITION Pipeline parallel layer partition specification Not set

Testing & Debugging (Internal/CI)

These variables are primarily used for internal testing, continuous integration, or debugging.

Environment Variable Description Default Value
SGLANG_IS_IN_CI Indicates if running in CI environment false
SGLANG_AMD_CI Indicates running in AMD CI environment 0
SGLANG_TEST_RETRACT Enable retract decode testing false
SGLANG_RECORD_STEP_TIME Record step time for profiling false
SGLANG_TEST_REQUEST_TIME_STATS Test request time statistics false
SGLANG_CI_SMALL_KV_SIZE Use small KV cache size in CI Not set

Profiling & Benchmarking

Environment Variable Description Default Value
SGLANG_TORCH_PROFILER_DIR Directory for PyTorch profiler output /tmp
SGLANG_PROFILE_WITH_STACK Set with_stack option (bool) for PyTorch profiler (capture stack trace) true

Storage & Caching

Environment Variable Description Default Value
SGLANG_DISABLE_OUTLINES_DISK_CACHE Disable Outlines disk cache true