111 lines
3.6 KiB
Markdown
111 lines
3.6 KiB
Markdown
# SGLang Documentation
|
|
|
|
We recommend new contributors start from writing documentation, which helps you quickly understand SGLang codebase. Most documentation files are located under the `docs/` folder. We prefer **Jupyter Notebooks** over Markdown so that all examples can be executed and validated by our docs CI pipeline.
|
|
|
|
## Docs Workflow
|
|
|
|
### Install Dependency
|
|
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
### Update Documentation
|
|
|
|
Update your Jupyter notebooks in the appropriate subdirectories under `docs/`. If you add new files, remember to update `index.rst` (or relevant `.rst` files) accordingly.
|
|
|
|
- **`pre-commit run --all-files`** manually runs all configured checks, applying fixes if possible. If it fails the first time, re-run it to ensure lint errors are fully resolved. Make sure your code passes all checks **before** creating a Pull Request.
|
|
- **Do not commit** directly to the `main` branch. Always create a new branch (e.g., `feature/my-new-feature`), push your changes, and open a PR from that branch.
|
|
|
|
```bash
|
|
# 1) Compile all Jupyter notebooks
|
|
make compile
|
|
|
|
# 2) Compile and Preview documentation locally with auto-build
|
|
# This will automatically rebuild docs when files change
|
|
# Open your browser at the displayed port to view the docs
|
|
bash serve.sh
|
|
|
|
# 2a) Alternative ways to serve documentation
|
|
# Directly use make serve
|
|
make serve
|
|
# With custom port
|
|
PORT=8080 make serve
|
|
|
|
# 3) Clean notebook outputs
|
|
# nbstripout removes notebook outputs so your PR stays clean
|
|
pip install nbstripout
|
|
find . -name '*.ipynb' -exec nbstripout {} \;
|
|
|
|
# 4) Pre-commit checks and create a PR
|
|
# After these checks pass, push your changes and open a PR on your branch
|
|
pre-commit run --all-files
|
|
```
|
|
---
|
|
|
|
### **Port Allocation and CI Efficiency**
|
|
|
|
**To launch and kill the server:**
|
|
|
|
```python
|
|
from sglang.test.test_utils import is_in_ci
|
|
from sglang.utils import wait_for_server, print_highlight, terminate_process
|
|
|
|
if is_in_ci():
|
|
from patch import launch_server_cmd
|
|
else:
|
|
from sglang.utils import launch_server_cmd
|
|
|
|
server_process, port = launch_server_cmd(
|
|
"""
|
|
python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \
|
|
--host 0.0.0.0
|
|
"""
|
|
)
|
|
|
|
wait_for_server(f"http://localhost:{port}")
|
|
|
|
# Terminate Server
|
|
terminate_process(server_process)
|
|
```
|
|
|
|
**To launch and kill the engine:**
|
|
|
|
```python
|
|
# Launch Engine
|
|
import sglang as sgl
|
|
import asyncio
|
|
from sglang.test.test_utils import is_in_ci
|
|
|
|
if is_in_ci():
|
|
import patch
|
|
|
|
llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")
|
|
|
|
# Terminalte Engine
|
|
llm.shutdown()
|
|
```
|
|
|
|
### **Why this approach?**
|
|
|
|
- **Dynamic Port Allocation**: Avoids port conflicts by selecting an available port at runtime, enabling multiple server instances to run in parallel.
|
|
- **Optimized for CI**: The `patch` version of `launch_server_cmd` and `sgl.Engine()` in CI environments helps manage GPU memory dynamically, preventing conflicts and improving test parallelism.
|
|
- **Better Parallel Execution**: Ensures smooth concurrent tests by avoiding fixed port collisions and optimizing memory usage.
|
|
|
|
### **Model Selection**
|
|
|
|
For demonstrations in the docs, **prefer smaller models** to reduce memory consumption and speed up inference. Running larger models in CI can lead to instability due to memory constraints.
|
|
|
|
### **Prompt Alignment Example**
|
|
|
|
When designing prompts, ensure they align with SGLang's structured formatting. For example:
|
|
|
|
```python
|
|
prompt = """You are an AI assistant. Answer concisely and accurately.
|
|
|
|
User: What is the capital of France?
|
|
Assistant: The capital of France is Paris."""
|
|
```
|
|
|
|
This keeps responses aligned with expected behavior and improves reliability across different files.
|