{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# SGLang Frontend Language" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "SGLang frontend language can be used to define simple and easy prompts in a convenient, structured way." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Launch A Server\n", "\n", "Launch the server in your terminal and wait for it to initialize." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import requests\n", "import os\n", "\n", "from sglang import assistant_begin, assistant_end\n", "from sglang import assistant, function, gen, system, user\n", "from sglang import image\n", "from sglang import RuntimeEndpoint, set_default_backend\n", "from sglang.srt.utils import load_image\n", "from sglang.test.test_utils import is_in_ci\n", "from sglang.utils import print_highlight, terminate_process, wait_for_server\n", "\n", "if is_in_ci():\n", " from patch import launch_server_cmd\n", "else:\n", " from sglang.utils import launch_server_cmd\n", "\n", "\n", "server_process, port = launch_server_cmd(\n", " \"python -m sglang.launch_server --model-path Qwen/Qwen2.5-7B-Instruct --host 0.0.0.0\"\n", ")\n", "\n", "wait_for_server(f\"http://localhost:{port}\")\n", "print(f\"Server started on http://localhost:{port}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set the default backend. Note: Besides the local server, you may use also `OpenAI` or other API endpoints." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "set_default_backend(RuntimeEndpoint(f\"http://localhost:{port}\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic Usage\n", "\n", "The most simple way of using SGLang frontend language is a simple question answer dialog between a user and an assistant." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@function\n", "def basic_qa(s, question):\n", " s += system(f\"You are a helpful assistant than can answer questions.\")\n", " s += user(question)\n", " s += assistant(gen(\"answer\", max_tokens=512))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "state = basic_qa(\"List 3 countries and their capitals.\")\n", "print_highlight(state[\"answer\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Multi-turn Dialog\n", "\n", "SGLang frontend language can also be used to define multi-turn dialogs." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@function\n", "def multi_turn_qa(s):\n", " s += system(f\"You are a helpful assistant than can answer questions.\")\n", " s += user(\"Please give me a list of 3 countries and their capitals.\")\n", " s += assistant(gen(\"first_answer\", max_tokens=512))\n", " s += user(\"Please give me another list of 3 countries and their capitals.\")\n", " s += assistant(gen(\"second_answer\", max_tokens=512))\n", " return s\n", "\n", "\n", "state = multi_turn_qa()\n", "print_highlight(state[\"first_answer\"])\n", "print_highlight(state[\"second_answer\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Control flow\n", "\n", "You may use any Python code within the function to define more complex control flows." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@function\n", "def tool_use(s, question):\n", " s += assistant(\n", " \"To answer this question: \"\n", " + question\n", " + \". I need to use a \"\n", " + gen(\"tool\", choices=[\"calculator\", \"search engine\"])\n", " + \". \"\n", " )\n", "\n", " if s[\"tool\"] == \"calculator\":\n", " s += assistant(\"The math expression is: \" + gen(\"expression\"))\n", " elif s[\"tool\"] == \"search engine\":\n", " s += assistant(\"The key word to search is: \" + gen(\"word\"))\n", "\n", "\n", "state = tool_use(\"What is 2 * 2?\")\n", "print_highlight(state[\"tool\"])\n", "print_highlight(state[\"expression\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Parallelism\n", "\n", "Use `fork` to launch parallel prompts. Because `sgl.gen` is non-blocking, the for loop below issues two generation calls in parallel." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@function\n", "def tip_suggestion(s):\n", " s += assistant(\n", " \"Here are two tips for staying healthy: \"\n", " \"1. Balanced Diet. 2. Regular Exercise.\\n\\n\"\n", " )\n", "\n", " forks = s.fork(2)\n", " for i, f in enumerate(forks):\n", " f += assistant(\n", " f\"Now, expand tip {i+1} into a paragraph:\\n\"\n", " + gen(\"detailed_tip\", max_tokens=256, stop=\"\\n\\n\")\n", " )\n", "\n", " s += assistant(\"Tip 1:\" + forks[0][\"detailed_tip\"] + \"\\n\")\n", " s += assistant(\"Tip 2:\" + forks[1][\"detailed_tip\"] + \"\\n\")\n", " s += assistant(\n", " \"To summarize the above two tips, I can say:\\n\" + gen(\"summary\", max_tokens=512)\n", " )\n", "\n", "\n", "state = tip_suggestion()\n", "print_highlight(state[\"summary\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Constrained Decoding\n", "\n", "Use `regex` to specify a regular expression as a decoding constraint. This is only supported for local models." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@function\n", "def regular_expression_gen(s):\n", " s += user(\"What is the IP address of the Google DNS servers?\")\n", " s += assistant(\n", " gen(\n", " \"answer\",\n", " temperature=0,\n", " regex=r\"((25[0-5]|2[0-4]\\d|[01]?\\d\\d?).){3}(25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\",\n", " )\n", " )\n", "\n", "\n", "state = regular_expression_gen()\n", "print_highlight(state[\"answer\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use `regex` to define a `JSON` decoding schema." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "character_regex = (\n", " r\"\"\"\\{\\n\"\"\"\n", " + r\"\"\" \"name\": \"[\\w\\d\\s]{1,16}\",\\n\"\"\"\n", " + r\"\"\" \"house\": \"(Gryffindor|Slytherin|Ravenclaw|Hufflepuff)\",\\n\"\"\"\n", " + r\"\"\" \"blood status\": \"(Pure-blood|Half-blood|Muggle-born)\",\\n\"\"\"\n", " + r\"\"\" \"occupation\": \"(student|teacher|auror|ministry of magic|death eater|order of the phoenix)\",\\n\"\"\"\n", " + r\"\"\" \"wand\": \\{\\n\"\"\"\n", " + r\"\"\" \"wood\": \"[\\w\\d\\s]{1,16}\",\\n\"\"\"\n", " + r\"\"\" \"core\": \"[\\w\\d\\s]{1,16}\",\\n\"\"\"\n", " + r\"\"\" \"length\": [0-9]{1,2}\\.[0-9]{0,2}\\n\"\"\"\n", " + r\"\"\" \\},\\n\"\"\"\n", " + r\"\"\" \"alive\": \"(Alive|Deceased)\",\\n\"\"\"\n", " + r\"\"\" \"patronus\": \"[\\w\\d\\s]{1,16}\",\\n\"\"\"\n", " + r\"\"\" \"bogart\": \"[\\w\\d\\s]{1,16}\"\\n\"\"\"\n", " + r\"\"\"\\}\"\"\"\n", ")\n", "\n", "\n", "@function\n", "def character_gen(s, name):\n", " s += user(\n", " f\"{name} is a character in Harry Potter. Please fill in the following information about this character.\"\n", " )\n", " s += assistant(gen(\"json_output\", max_tokens=256, regex=character_regex))\n", "\n", "\n", "state = character_gen(\"Harry Potter\")\n", "print_highlight(state[\"json_output\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Batching \n", "\n", "Use `run_batch` to run a batch of prompts." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@function\n", "def text_qa(s, question):\n", " s += user(question)\n", " s += assistant(gen(\"answer\", stop=\"\\n\"))\n", "\n", "\n", "states = text_qa.run_batch(\n", " [\n", " {\"question\": \"What is the capital of the United Kingdom?\"},\n", " {\"question\": \"What is the capital of France?\"},\n", " {\"question\": \"What is the capital of Japan?\"},\n", " ],\n", " progress_bar=True,\n", ")\n", "\n", "for i, state in enumerate(states):\n", " print_highlight(f\"Answer {i+1}: {states[i]['answer']}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Streaming \n", "\n", "Use `stream` to stream the output to the user." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@function\n", "def text_qa(s, question):\n", " s += user(question)\n", " s += assistant(gen(\"answer\", stop=\"\\n\"))\n", "\n", "\n", "state = text_qa.run(\n", " question=\"What is the capital of France?\", temperature=0.1, stream=True\n", ")\n", "\n", "for out in state.text_iter():\n", " print(out, end=\"\", flush=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Complex Prompts\n", "\n", "You may use `{system|user|assistant}_{begin|end}` to define complex prompts." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@function\n", "def chat_example(s):\n", " s += system(\"You are a helpful assistant.\")\n", " # Same as: s += s.system(\"You are a helpful assistant.\")\n", "\n", " with s.user():\n", " s += \"Question: What is the capital of France?\"\n", "\n", " s += assistant_begin()\n", " s += \"Answer: \" + gen(\"answer\", max_tokens=100, stop=\"\\n\")\n", " s += assistant_end()\n", "\n", "\n", "state = chat_example()\n", "print_highlight(state[\"answer\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "terminate_process(server_process)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Multi-modal Generation\n", "\n", "You may use SGLang frontend language to define multi-modal prompts.\n", "See [here](https://docs.sglang.ai/supported_models/generative_models.html) for supported models." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "server_process, port = launch_server_cmd(\n", " \"python -m sglang.launch_server --model-path Qwen/Qwen2.5-VL-7B-Instruct --host 0.0.0.0\"\n", ")\n", "\n", "wait_for_server(f\"http://localhost:{port}\")\n", "print(f\"Server started on http://localhost:{port}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "set_default_backend(RuntimeEndpoint(f\"http://localhost:{port}\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ask a question about an image." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@function\n", "def image_qa(s, image_file, question):\n", " s += user(image(image_file) + question)\n", " s += assistant(gen(\"answer\", max_tokens=256))\n", "\n", "\n", "image_url = \"https://github.com/sgl-project/sglang/blob/main/test/lang/example_image.png?raw=true\"\n", "image_bytes, _ = load_image(image_url)\n", "state = image_qa(image_bytes, \"What is in the image?\")\n", "print_highlight(state[\"answer\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "terminate_process(server_process)" ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 2 }