vllm/vllm_v0.10.0/examples/offline_inference/disaggregated-prefill-v1
hailin 38d813617c first commit 2025-08-03 20:28:19 +08:00
..
README.md first commit 2025-08-03 20:28:19 +08:00
decode_example.py first commit 2025-08-03 20:28:19 +08:00
prefill_example.py first commit 2025-08-03 20:28:19 +08:00
run.sh first commit 2025-08-03 20:28:19 +08:00

README.md

Disaggregated Prefill V1

This example contains scripts that demonstrate disaggregated prefill in the offline setting of vLLM.

Files

  • run.sh - A helper script that will run prefill_example.py and decode_example.py sequentially.
    • Make sure you are in the examples/offline_inference/disaggregated-prefill-v1 directory before running run.sh.
  • prefill_example.py - A script which performs prefill only, saving the KV state to the local_storage directory and the prompts to output.txt.
  • decode_example.py - A script which performs decode only, loading the KV state from the local_storage directory and the prompts from output.txt.