sglang_v0.5.2/sglang/docs/advanced_features/observability.md

1.4 KiB

Observability

Production Metrics

SGLang exposes the following metrics via Prometheus. You can enable them by adding --enable-metrics when launching the server. You can query them by:

curl http://localhost:30000/metrics

See Production Metrics for more details.

Logging

By default, SGLang does not log any request contents. You can log them by using --log-requests. You can control the verbosity by using --log-request-level. See Logging for more details.

Request Dump and Replay

You can dump all requests and replay them later for benchmarking or other purposes.

To start dumping, use the following command to send a request to a server:

python3 -m sglang.srt.managers.configure_logging --url http://localhost:30000 --dump-requests-folder /tmp/sglang_request_dump --dump-requests-threshold 100

The server will dump the requests into a pickle file for every 100 requests.

To replay the request dump, use scripts/playground/replay_request_dump.py.

Crash Dump and Replay

Sometimes the server might crash, and you may want to debug the cause of the crash. SGLang supports crash dumping, which will dump all requests from the 5 minutes before the crash, allowing you to replay the requests and debug the reason later.

To enable crash dumping, use --crash-dump-folder /tmp/crash_dump. To replay the crash dump, use scripts/playground/replay_request_dump.py.