465 lines
13 KiB
Plaintext
465 lines
13 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# SGLang Frontend Language"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"SGLang frontend language can be used to define simple and easy prompts in a convenient, structured way."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Launch A Server\n",
|
|
"\n",
|
|
"Launch the server in your terminal and wait for it to initialize."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import requests\n",
|
|
"import os\n",
|
|
"\n",
|
|
"from sglang import assistant_begin, assistant_end\n",
|
|
"from sglang import assistant, function, gen, system, user\n",
|
|
"from sglang import image\n",
|
|
"from sglang import RuntimeEndpoint, set_default_backend\n",
|
|
"from sglang.srt.utils import load_image\n",
|
|
"from sglang.test.test_utils import is_in_ci\n",
|
|
"from sglang.utils import print_highlight, terminate_process, wait_for_server\n",
|
|
"\n",
|
|
"if is_in_ci():\n",
|
|
" from patch import launch_server_cmd\n",
|
|
"else:\n",
|
|
" from sglang.utils import launch_server_cmd\n",
|
|
"\n",
|
|
"\n",
|
|
"server_process, port = launch_server_cmd(\n",
|
|
" \"python -m sglang.launch_server --model-path Qwen/Qwen2.5-7B-Instruct --host 0.0.0.0\"\n",
|
|
")\n",
|
|
"\n",
|
|
"wait_for_server(f\"http://localhost:{port}\")\n",
|
|
"print(f\"Server started on http://localhost:{port}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Set the default backend. Note: Besides the local server, you may use also `OpenAI` or other API endpoints."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"set_default_backend(RuntimeEndpoint(f\"http://localhost:{port}\"))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Basic Usage\n",
|
|
"\n",
|
|
"The most simple way of using SGLang frontend language is a simple question answer dialog between a user and an assistant."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"@function\n",
|
|
"def basic_qa(s, question):\n",
|
|
" s += system(f\"You are a helpful assistant than can answer questions.\")\n",
|
|
" s += user(question)\n",
|
|
" s += assistant(gen(\"answer\", max_tokens=512))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"state = basic_qa(\"List 3 countries and their capitals.\")\n",
|
|
"print_highlight(state[\"answer\"])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Multi-turn Dialog\n",
|
|
"\n",
|
|
"SGLang frontend language can also be used to define multi-turn dialogs."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"@function\n",
|
|
"def multi_turn_qa(s):\n",
|
|
" s += system(f\"You are a helpful assistant than can answer questions.\")\n",
|
|
" s += user(\"Please give me a list of 3 countries and their capitals.\")\n",
|
|
" s += assistant(gen(\"first_answer\", max_tokens=512))\n",
|
|
" s += user(\"Please give me another list of 3 countries and their capitals.\")\n",
|
|
" s += assistant(gen(\"second_answer\", max_tokens=512))\n",
|
|
" return s\n",
|
|
"\n",
|
|
"\n",
|
|
"state = multi_turn_qa()\n",
|
|
"print_highlight(state[\"first_answer\"])\n",
|
|
"print_highlight(state[\"second_answer\"])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Control flow\n",
|
|
"\n",
|
|
"You may use any Python code within the function to define more complex control flows."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"@function\n",
|
|
"def tool_use(s, question):\n",
|
|
" s += assistant(\n",
|
|
" \"To answer this question: \"\n",
|
|
" + question\n",
|
|
" + \". I need to use a \"\n",
|
|
" + gen(\"tool\", choices=[\"calculator\", \"search engine\"])\n",
|
|
" + \". \"\n",
|
|
" )\n",
|
|
"\n",
|
|
" if s[\"tool\"] == \"calculator\":\n",
|
|
" s += assistant(\"The math expression is: \" + gen(\"expression\"))\n",
|
|
" elif s[\"tool\"] == \"search engine\":\n",
|
|
" s += assistant(\"The key word to search is: \" + gen(\"word\"))\n",
|
|
"\n",
|
|
"\n",
|
|
"state = tool_use(\"What is 2 * 2?\")\n",
|
|
"print_highlight(state[\"tool\"])\n",
|
|
"print_highlight(state[\"expression\"])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Parallelism\n",
|
|
"\n",
|
|
"Use `fork` to launch parallel prompts. Because `sgl.gen` is non-blocking, the for loop below issues two generation calls in parallel."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"@function\n",
|
|
"def tip_suggestion(s):\n",
|
|
" s += assistant(\n",
|
|
" \"Here are two tips for staying healthy: \"\n",
|
|
" \"1. Balanced Diet. 2. Regular Exercise.\\n\\n\"\n",
|
|
" )\n",
|
|
"\n",
|
|
" forks = s.fork(2)\n",
|
|
" for i, f in enumerate(forks):\n",
|
|
" f += assistant(\n",
|
|
" f\"Now, expand tip {i+1} into a paragraph:\\n\"\n",
|
|
" + gen(\"detailed_tip\", max_tokens=256, stop=\"\\n\\n\")\n",
|
|
" )\n",
|
|
"\n",
|
|
" s += assistant(\"Tip 1:\" + forks[0][\"detailed_tip\"] + \"\\n\")\n",
|
|
" s += assistant(\"Tip 2:\" + forks[1][\"detailed_tip\"] + \"\\n\")\n",
|
|
" s += assistant(\n",
|
|
" \"To summarize the above two tips, I can say:\\n\" + gen(\"summary\", max_tokens=512)\n",
|
|
" )\n",
|
|
"\n",
|
|
"\n",
|
|
"state = tip_suggestion()\n",
|
|
"print_highlight(state[\"summary\"])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Constrained Decoding\n",
|
|
"\n",
|
|
"Use `regex` to specify a regular expression as a decoding constraint. This is only supported for local models."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"@function\n",
|
|
"def regular_expression_gen(s):\n",
|
|
" s += user(\"What is the IP address of the Google DNS servers?\")\n",
|
|
" s += assistant(\n",
|
|
" gen(\n",
|
|
" \"answer\",\n",
|
|
" temperature=0,\n",
|
|
" regex=r\"((25[0-5]|2[0-4]\\d|[01]?\\d\\d?).){3}(25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\",\n",
|
|
" )\n",
|
|
" )\n",
|
|
"\n",
|
|
"\n",
|
|
"state = regular_expression_gen()\n",
|
|
"print_highlight(state[\"answer\"])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Use `regex` to define a `JSON` decoding schema."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"character_regex = (\n",
|
|
" r\"\"\"\\{\\n\"\"\"\n",
|
|
" + r\"\"\" \"name\": \"[\\w\\d\\s]{1,16}\",\\n\"\"\"\n",
|
|
" + r\"\"\" \"house\": \"(Gryffindor|Slytherin|Ravenclaw|Hufflepuff)\",\\n\"\"\"\n",
|
|
" + r\"\"\" \"blood status\": \"(Pure-blood|Half-blood|Muggle-born)\",\\n\"\"\"\n",
|
|
" + r\"\"\" \"occupation\": \"(student|teacher|auror|ministry of magic|death eater|order of the phoenix)\",\\n\"\"\"\n",
|
|
" + r\"\"\" \"wand\": \\{\\n\"\"\"\n",
|
|
" + r\"\"\" \"wood\": \"[\\w\\d\\s]{1,16}\",\\n\"\"\"\n",
|
|
" + r\"\"\" \"core\": \"[\\w\\d\\s]{1,16}\",\\n\"\"\"\n",
|
|
" + r\"\"\" \"length\": [0-9]{1,2}\\.[0-9]{0,2}\\n\"\"\"\n",
|
|
" + r\"\"\" \\},\\n\"\"\"\n",
|
|
" + r\"\"\" \"alive\": \"(Alive|Deceased)\",\\n\"\"\"\n",
|
|
" + r\"\"\" \"patronus\": \"[\\w\\d\\s]{1,16}\",\\n\"\"\"\n",
|
|
" + r\"\"\" \"bogart\": \"[\\w\\d\\s]{1,16}\"\\n\"\"\"\n",
|
|
" + r\"\"\"\\}\"\"\"\n",
|
|
")\n",
|
|
"\n",
|
|
"\n",
|
|
"@function\n",
|
|
"def character_gen(s, name):\n",
|
|
" s += user(\n",
|
|
" f\"{name} is a character in Harry Potter. Please fill in the following information about this character.\"\n",
|
|
" )\n",
|
|
" s += assistant(gen(\"json_output\", max_tokens=256, regex=character_regex))\n",
|
|
"\n",
|
|
"\n",
|
|
"state = character_gen(\"Harry Potter\")\n",
|
|
"print_highlight(state[\"json_output\"])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Batching \n",
|
|
"\n",
|
|
"Use `run_batch` to run a batch of prompts."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"@function\n",
|
|
"def text_qa(s, question):\n",
|
|
" s += user(question)\n",
|
|
" s += assistant(gen(\"answer\", stop=\"\\n\"))\n",
|
|
"\n",
|
|
"\n",
|
|
"states = text_qa.run_batch(\n",
|
|
" [\n",
|
|
" {\"question\": \"What is the capital of the United Kingdom?\"},\n",
|
|
" {\"question\": \"What is the capital of France?\"},\n",
|
|
" {\"question\": \"What is the capital of Japan?\"},\n",
|
|
" ],\n",
|
|
" progress_bar=True,\n",
|
|
")\n",
|
|
"\n",
|
|
"for i, state in enumerate(states):\n",
|
|
" print_highlight(f\"Answer {i+1}: {states[i]['answer']}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Streaming \n",
|
|
"\n",
|
|
"Use `stream` to stream the output to the user."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"@function\n",
|
|
"def text_qa(s, question):\n",
|
|
" s += user(question)\n",
|
|
" s += assistant(gen(\"answer\", stop=\"\\n\"))\n",
|
|
"\n",
|
|
"\n",
|
|
"state = text_qa.run(\n",
|
|
" question=\"What is the capital of France?\", temperature=0.1, stream=True\n",
|
|
")\n",
|
|
"\n",
|
|
"for out in state.text_iter():\n",
|
|
" print(out, end=\"\", flush=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Complex Prompts\n",
|
|
"\n",
|
|
"You may use `{system|user|assistant}_{begin|end}` to define complex prompts."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"@function\n",
|
|
"def chat_example(s):\n",
|
|
" s += system(\"You are a helpful assistant.\")\n",
|
|
" # Same as: s += s.system(\"You are a helpful assistant.\")\n",
|
|
"\n",
|
|
" with s.user():\n",
|
|
" s += \"Question: What is the capital of France?\"\n",
|
|
"\n",
|
|
" s += assistant_begin()\n",
|
|
" s += \"Answer: \" + gen(\"answer\", max_tokens=100, stop=\"\\n\")\n",
|
|
" s += assistant_end()\n",
|
|
"\n",
|
|
"\n",
|
|
"state = chat_example()\n",
|
|
"print_highlight(state[\"answer\"])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"terminate_process(server_process)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Multi-modal Generation\n",
|
|
"\n",
|
|
"You may use SGLang frontend language to define multi-modal prompts.\n",
|
|
"See [here](https://docs.sglang.ai/references/supported_models.html) for supported models."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"server_process, port = launch_server_cmd(\n",
|
|
" \"python -m sglang.launch_server --model-path Qwen/Qwen2.5-VL-7B-Instruct --host 0.0.0.0\"\n",
|
|
")\n",
|
|
"\n",
|
|
"wait_for_server(f\"http://localhost:{port}\")\n",
|
|
"print(f\"Server started on http://localhost:{port}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"set_default_backend(RuntimeEndpoint(f\"http://localhost:{port}\"))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Ask a question about an image."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"@function\n",
|
|
"def image_qa(s, image_file, question):\n",
|
|
" s += user(image(image_file) + question)\n",
|
|
" s += assistant(gen(\"answer\", max_tokens=256))\n",
|
|
"\n",
|
|
"\n",
|
|
"image_url = \"https://github.com/sgl-project/sglang/blob/main/test/lang/example_image.png?raw=true\"\n",
|
|
"image_bytes, _ = load_image(image_url)\n",
|
|
"state = image_qa(image_bytes, \"What is in the image?\")\n",
|
|
"print_highlight(state[\"answer\"])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"terminate_process(server_process)"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"language_info": {
|
|
"name": "python"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|