{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Structured Outputs For Reasoning Models\n", "\n", "When working with reasoning models that use special tokens like `...` to denote reasoning sections, you might want to allow free-form text within these sections while still enforcing grammar constraints on the rest of the output.\n", "\n", "SGLang provides a feature to disable grammar restrictions within reasoning sections. This is particularly useful for models that need to perform complex reasoning steps before providing a structured output.\n", "\n", "To enable this feature, use the `--reasoning-parser` flag which decide the think_end_token, such as ``, when launching the server. You can also specify the reasoning parser using the `--reasoning-parser` flag.\n", "\n", "## Supported Models\n", "\n", "Currently, SGLang supports the following reasoning models:\n", "- [DeepSeek R1 series](https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d): The reasoning content is wrapped with `` and `` tags.\n", "- [QwQ](https://huggingface.co/Qwen/QwQ-32B): The reasoning content is wrapped with `` and `` tags.\n", "\n", "\n", "## Usage\n", "\n", "## OpenAI Compatible API" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Specify the `--grammar-backend`, `--reasoning-parser` option." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import openai\n", "import os\n", "from sglang.test.test_utils import is_in_ci\n", "\n", "if is_in_ci():\n", " from patch import launch_server_cmd\n", "else:\n", " from sglang.utils import launch_server_cmd\n", "\n", "from sglang.utils import wait_for_server, print_highlight, terminate_process\n", "\n", "os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n", "\n", "\n", "server_process, port = launch_server_cmd(\n", " \"python -m sglang.launch_server --model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --host 0.0.0.0 --reasoning-parser deepseek-r1\"\n", ")\n", "\n", "wait_for_server(f\"http://localhost:{port}\")\n", "client = openai.Client(base_url=f\"http://127.0.0.1:{port}/v1\", api_key=\"None\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### JSON\n", "\n", "you can directly define a JSON schema or use [Pydantic](https://docs.pydantic.dev/latest/) to define and validate the response." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Using Pydantic**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pydantic import BaseModel, Field\n", "\n", "\n", "# Define the schema using Pydantic\n", "class CapitalInfo(BaseModel):\n", " name: str = Field(..., pattern=r\"^\\w+$\", description=\"Name of the capital city\")\n", " population: int = Field(..., description=\"Population of the capital city\")\n", "\n", "\n", "response = client.chat.completions.create(\n", " model=\"deepseek-ai/DeepSeek-R1-Distill-Qwen-7B\",\n", " messages=[\n", " {\n", " \"role\": \"assistant\",\n", " \"content\": \"Give me the information and population of the capital of France in the JSON format.\",\n", " },\n", " ],\n", " temperature=0,\n", " max_tokens=2048,\n", " response_format={\n", " \"type\": \"json_schema\",\n", " \"json_schema\": {\n", " \"name\": \"foo\",\n", " # convert the pydantic model to json schema\n", " \"schema\": CapitalInfo.model_json_schema(),\n", " },\n", " },\n", ")\n", "\n", "print_highlight(\n", " f\"reasoing_content: {response.choices[0].message.reasoning_content}\\n\\ncontent: {response.choices[0].message.content}\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**JSON Schema Directly**\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import json\n", "\n", "json_schema = json.dumps(\n", " {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"name\": {\"type\": \"string\", \"pattern\": \"^[\\\\w]+$\"},\n", " \"population\": {\"type\": \"integer\"},\n", " },\n", " \"required\": [\"name\", \"population\"],\n", " }\n", ")\n", "\n", "response = client.chat.completions.create(\n", " model=\"deepseek-ai/DeepSeek-R1-Distill-Qwen-7B\",\n", " messages=[\n", " {\n", " \"role\": \"assistant\",\n", " \"content\": \"Give me the information and population of the capital of France in the JSON format.\",\n", " },\n", " ],\n", " temperature=0,\n", " max_tokens=2048,\n", " response_format={\n", " \"type\": \"json_schema\",\n", " \"json_schema\": {\"name\": \"foo\", \"schema\": json.loads(json_schema)},\n", " },\n", ")\n", "\n", "print_highlight(\n", " f\"reasoing_content: {response.choices[0].message.reasoning_content}\\n\\ncontent: {response.choices[0].message.content}\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### EBNF" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ebnf_grammar = \"\"\"\n", "root ::= city | description\n", "city ::= \"London\" | \"Paris\" | \"Berlin\" | \"Rome\"\n", "description ::= city \" is \" status\n", "status ::= \"the capital of \" country\n", "country ::= \"England\" | \"France\" | \"Germany\" | \"Italy\"\n", "\"\"\"\n", "\n", "response = client.chat.completions.create(\n", " model=\"deepseek-ai/DeepSeek-R1-Distill-Qwen-7B\",\n", " messages=[\n", " {\"role\": \"system\", \"content\": \"You are a helpful geography bot.\"},\n", " {\n", " \"role\": \"assistant\",\n", " \"content\": \"Give me the information and population of the capital of France in the JSON format.\",\n", " },\n", " ],\n", " temperature=0,\n", " max_tokens=2048,\n", " extra_body={\"ebnf\": ebnf_grammar},\n", ")\n", "\n", "print_highlight(\n", " f\"reasoing_content: {response.choices[0].message.reasoning_content}\\n\\ncontent: {response.choices[0].message.content}\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Regular expression" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "response = client.chat.completions.create(\n", " model=\"deepseek-ai/DeepSeek-R1-Distill-Qwen-7B\",\n", " messages=[\n", " {\"role\": \"assistant\", \"content\": \"What is the capital of France?\"},\n", " ],\n", " temperature=0,\n", " max_tokens=2048,\n", " extra_body={\"regex\": \"(Paris|London)\"},\n", ")\n", "\n", "print_highlight(\n", " f\"reasoing_content: {response.choices[0].message.reasoning_content}\\n\\ncontent: {response.choices[0].message.content}\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Structural Tag" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tool_get_current_weather = {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"get_current_weather\",\n", " \"description\": \"Get the current weather in a given location\",\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"city\": {\n", " \"type\": \"string\",\n", " \"description\": \"The city to find the weather for, e.g. 'San Francisco'\",\n", " },\n", " \"state\": {\n", " \"type\": \"string\",\n", " \"description\": \"the two-letter abbreviation for the state that the city is\"\n", " \" in, e.g. 'CA' which would mean 'California'\",\n", " },\n", " \"unit\": {\n", " \"type\": \"string\",\n", " \"description\": \"The unit to fetch the temperature in\",\n", " \"enum\": [\"celsius\", \"fahrenheit\"],\n", " },\n", " },\n", " \"required\": [\"city\", \"state\", \"unit\"],\n", " },\n", " },\n", "}\n", "\n", "tool_get_current_date = {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"get_current_date\",\n", " \"description\": \"Get the current date and time for a given timezone\",\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"timezone\": {\n", " \"type\": \"string\",\n", " \"description\": \"The timezone to fetch the current date and time for, e.g. 'America/New_York'\",\n", " }\n", " },\n", " \"required\": [\"timezone\"],\n", " },\n", " },\n", "}\n", "\n", "schema_get_current_weather = tool_get_current_weather[\"function\"][\"parameters\"]\n", "schema_get_current_date = tool_get_current_date[\"function\"][\"parameters\"]\n", "\n", "\n", "def get_messages():\n", " return [\n", " {\n", " \"role\": \"system\",\n", " \"content\": f\"\"\"\n", "# Tool Instructions\n", "- Always execute python code in messages that you share.\n", "- When looking for real time information use relevant functions if available else fallback to brave_search\n", "You have access to the following functions:\n", "Use the function 'get_current_weather' to: Get the current weather in a given location\n", "{tool_get_current_weather[\"function\"]}\n", "Use the function 'get_current_date' to: Get the current date and time for a given timezone\n", "{tool_get_current_date[\"function\"]}\n", "If a you choose to call a function ONLY reply in the following format:\n", "<{{start_tag}}={{function_name}}>{{parameters}}{{end_tag}}\n", "where\n", "start_tag => ` a JSON dict with the function argument name as key and function argument value as value.\n", "end_tag => ``\n", "Here is an example,\n", "{{\"example_name\": \"example_value\"}}\n", "Reminder:\n", "- Function calls MUST follow the specified format\n", "- Required parameters MUST be specified\n", "- Only call one function at a time\n", "- Put the entire function call reply on one line\n", "- Always add your sources when using search results to answer the user query\n", "You are a helpful assistant.\"\"\",\n", " },\n", " {\n", " \"role\": \"assistant\",\n", " \"content\": \"You are in New York. Please get the current date and time, and the weather.\",\n", " },\n", " ]\n", "\n", "\n", "messages = get_messages()\n", "\n", "response = client.chat.completions.create(\n", " model=\"deepseek-ai/DeepSeek-R1-Distill-Qwen-7B\",\n", " messages=messages,\n", " response_format={\n", " \"type\": \"structural_tag\",\n", " \"max_new_tokens\": 2048,\n", " \"structures\": [\n", " {\n", " \"begin\": \"\",\n", " \"schema\": schema_get_current_weather,\n", " \"end\": \"\",\n", " },\n", " {\n", " \"begin\": \"\",\n", " \"schema\": schema_get_current_date,\n", " \"end\": \"\",\n", " },\n", " ],\n", " \"triggers\": [\"\")[0]\n", "content = response.json()[\"text\"].split(\"\")[1]\n", "print_highlight(f\"reasoing_content: {reasoing_content}\\n\\ncontent: {content}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**JSON Schema Directly**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "json_schema = json.dumps(\n", " {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"name\": {\"type\": \"string\", \"pattern\": \"^[\\\\w]+$\"},\n", " \"population\": {\"type\": \"integer\"},\n", " },\n", " \"required\": [\"name\", \"population\"],\n", " }\n", ")\n", "\n", "# JSON\n", "text = tokenizer.apply_chat_template(\n", " messages, tokenize=False, add_generation_prompt=True\n", ")\n", "response = requests.post(\n", " f\"http://localhost:{port}/generate\",\n", " json={\n", " \"text\": text,\n", " \"sampling_params\": {\n", " \"temperature\": 0,\n", " \"max_new_tokens\": 2048,\n", " \"json_schema\": json_schema,\n", " },\n", " },\n", ")\n", "\n", "print_highlight(response.json())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### EBNF" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "response = requests.post(\n", " f\"http://localhost:{port}/generate\",\n", " json={\n", " \"text\": \"Give me the information of the capital of France.\",\n", " \"sampling_params\": {\n", " \"max_new_tokens\": 2048,\n", " \"temperature\": 0,\n", " \"n\": 3,\n", " \"ebnf\": (\n", " \"root ::= city | description\\n\"\n", " 'city ::= \"London\" | \"Paris\" | \"Berlin\" | \"Rome\"\\n'\n", " 'description ::= city \" is \" status\\n'\n", " 'status ::= \"the capital of \" country\\n'\n", " 'country ::= \"England\" | \"France\" | \"Germany\" | \"Italy\"'\n", " ),\n", " },\n", " \"stream\": False,\n", " \"return_logprob\": False,\n", " },\n", ")\n", "\n", "print(response.json())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Regular expression" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "response = requests.post(\n", " f\"http://localhost:{port}/generate\",\n", " json={\n", " \"text\": \"Paris is the capital of\",\n", " \"sampling_params\": {\n", " \"temperature\": 0,\n", " \"max_new_tokens\": 2048,\n", " \"regex\": \"(France|England)\",\n", " },\n", " },\n", ")\n", "print(response.json())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Structural Tag" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "text = tokenizer.apply_chat_template(\n", " messages, tokenize=False, add_generation_prompt=True\n", ")\n", "payload = {\n", " \"text\": text,\n", " \"sampling_params\": {\n", " \"max_new_tokens\": 2048,\n", " \"structural_tag\": json.dumps(\n", " {\n", " \"type\": \"structural_tag\",\n", " \"structures\": [\n", " {\n", " \"begin\": \"\",\n", " \"schema\": schema_get_current_weather,\n", " \"end\": \"\",\n", " },\n", " {\n", " \"begin\": \"\",\n", " \"schema\": schema_get_current_date,\n", " \"end\": \"\",\n", " },\n", " ],\n", " \"triggers\": [\"\",\n", " \"schema\": schema_get_current_weather,\n", " \"end\": \"\",\n", " },\n", " {\n", " \"begin\": \"\",\n", " \"schema\": schema_get_current_date,\n", " \"end\": \"\",\n", " },\n", " ],\n", " \"triggers\": [\"