{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Structured Outputs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can specify a JSON schema, [regular expression](https://en.wikipedia.org/wiki/Regular_expression) or [EBNF](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form) to constrain the model output. The model output will be guaranteed to follow the given constraints. Only one constraint parameter (`json_schema`, `regex`, or `ebnf`) can be specified for a request.\n", "\n", "SGLang supports three grammar backends:\n", "\n", "- [Outlines](https://github.com/dottxt-ai/outlines): Supports JSON schema and regular expression constraints.\n", "- [XGrammar](https://github.com/mlc-ai/xgrammar)(default): Supports JSON schema, regular expression, and EBNF constraints.\n", "- [Llguidance](https://github.com/guidance-ai/llguidance): Supports JSON schema, regular expression, and EBNF constraints.\n", "\n", "We suggest using XGrammar for its better performance and utility. XGrammar currently uses the [GGML BNF format](https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md). For more details, see [XGrammar technical overview](https://blog.mlc.ai/2024/11/22/achieving-efficient-flexible-portable-structured-generation-with-xgrammar).\n", "\n", "To use Outlines, simply add `--grammar-backend outlines` when launching the server.\n", "To use llguidance, add `--grammar-backend llguidance` when launching the server.\n", "If no backend is specified, XGrammar will be used as the default.\n", "\n", "For better output quality, **It's advisable to explicitly include instructions in the prompt to guide the model to generate the desired format.** For example, you can specify, 'Please generate the output in the following JSON format: ...'.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## OpenAI Compatible API" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import openai\n", "import os\n", "from sglang.test.test_utils import is_in_ci\n", "\n", "if is_in_ci():\n", " from patch import launch_server_cmd\n", "else:\n", " from sglang.utils import launch_server_cmd\n", "\n", "from sglang.utils import wait_for_server, print_highlight, terminate_process\n", "\n", "os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n", "\n", "\n", "server_process, port = launch_server_cmd(\n", " \"python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --host 0.0.0.0\"\n", ")\n", "\n", "wait_for_server(f\"http://localhost:{port}\")\n", "client = openai.Client(base_url=f\"http://127.0.0.1:{port}/v1\", api_key=\"None\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### JSON\n", "\n", "you can directly define a JSON schema or use [Pydantic](https://docs.pydantic.dev/latest/) to define and validate the response." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Using Pydantic**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pydantic import BaseModel, Field\n", "\n", "\n", "# Define the schema using Pydantic\n", "class CapitalInfo(BaseModel):\n", " name: str = Field(..., pattern=r\"^\\w+$\", description=\"Name of the capital city\")\n", " population: int = Field(..., description=\"Population of the capital city\")\n", "\n", "\n", "response = client.chat.completions.create(\n", " model=\"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n", " messages=[\n", " {\n", " \"role\": \"user\",\n", " \"content\": \"Please generate the information of the capital of France in the JSON format.\",\n", " },\n", " ],\n", " temperature=0,\n", " max_tokens=128,\n", " response_format={\n", " \"type\": \"json_schema\",\n", " \"json_schema\": {\n", " \"name\": \"foo\",\n", " # convert the pydantic model to json schema\n", " \"schema\": CapitalInfo.model_json_schema(),\n", " },\n", " },\n", ")\n", "\n", "response_content = response.choices[0].message.content\n", "# validate the JSON response by the pydantic model\n", "capital_info = CapitalInfo.model_validate_json(response_content)\n", "print_highlight(f\"Validated response: {capital_info.model_dump_json()}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**JSON Schema Directly**\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import json\n", "\n", "json_schema = json.dumps(\n", " {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"name\": {\"type\": \"string\", \"pattern\": \"^[\\\\w]+$\"},\n", " \"population\": {\"type\": \"integer\"},\n", " },\n", " \"required\": [\"name\", \"population\"],\n", " }\n", ")\n", "\n", "response = client.chat.completions.create(\n", " model=\"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n", " messages=[\n", " {\n", " \"role\": \"user\",\n", " \"content\": \"Give me the information of the capital of France in the JSON format.\",\n", " },\n", " ],\n", " temperature=0,\n", " max_tokens=128,\n", " response_format={\n", " \"type\": \"json_schema\",\n", " \"json_schema\": {\"name\": \"foo\", \"schema\": json.loads(json_schema)},\n", " },\n", ")\n", "\n", "print_highlight(response.choices[0].message.content)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### EBNF" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ebnf_grammar = \"\"\"\n", "root ::= city | description\n", "city ::= \"London\" | \"Paris\" | \"Berlin\" | \"Rome\"\n", "description ::= city \" is \" status\n", "status ::= \"the capital of \" country\n", "country ::= \"England\" | \"France\" | \"Germany\" | \"Italy\"\n", "\"\"\"\n", "\n", "response = client.chat.completions.create(\n", " model=\"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n", " messages=[\n", " {\"role\": \"system\", \"content\": \"You are a helpful geography bot.\"},\n", " {\n", " \"role\": \"user\",\n", " \"content\": \"Give me the information of the capital of France.\",\n", " },\n", " ],\n", " temperature=0,\n", " max_tokens=32,\n", " extra_body={\"ebnf\": ebnf_grammar},\n", ")\n", "\n", "print_highlight(response.choices[0].message.content)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Regular expression" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "response = client.chat.completions.create(\n", " model=\"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n", " messages=[\n", " {\"role\": \"user\", \"content\": \"What is the capital of France?\"},\n", " ],\n", " temperature=0,\n", " max_tokens=128,\n", " extra_body={\"regex\": \"(Paris|London)\"},\n", ")\n", "\n", "print_highlight(response.choices[0].message.content)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Structural Tag" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tool_get_current_weather = {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"get_current_weather\",\n", " \"description\": \"Get the current weather in a given location\",\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"city\": {\n", " \"type\": \"string\",\n", " \"description\": \"The city to find the weather for, e.g. 'San Francisco'\",\n", " },\n", " \"state\": {\n", " \"type\": \"string\",\n", " \"description\": \"the two-letter abbreviation for the state that the city is\"\n", " \" in, e.g. 'CA' which would mean 'California'\",\n", " },\n", " \"unit\": {\n", " \"type\": \"string\",\n", " \"description\": \"The unit to fetch the temperature in\",\n", " \"enum\": [\"celsius\", \"fahrenheit\"],\n", " },\n", " },\n", " \"required\": [\"city\", \"state\", \"unit\"],\n", " },\n", " },\n", "}\n", "\n", "tool_get_current_date = {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"get_current_date\",\n", " \"description\": \"Get the current date and time for a given timezone\",\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"timezone\": {\n", " \"type\": \"string\",\n", " \"description\": \"The timezone to fetch the current date and time for, e.g. 'America/New_York'\",\n", " }\n", " },\n", " \"required\": [\"timezone\"],\n", " },\n", " },\n", "}\n", "\n", "schema_get_current_weather = tool_get_current_weather[\"function\"][\"parameters\"]\n", "schema_get_current_date = tool_get_current_date[\"function\"][\"parameters\"]\n", "\n", "\n", "def get_messages():\n", " return [\n", " {\n", " \"role\": \"system\",\n", " \"content\": f\"\"\"\n", "# Tool Instructions\n", "- Always execute python code in messages that you share.\n", "- When looking for real time information use relevant functions if available else fallback to brave_search\n", "You have access to the following functions:\n", "Use the function 'get_current_weather' to: Get the current weather in a given location\n", "{tool_get_current_weather[\"function\"]}\n", "Use the function 'get_current_date' to: Get the current date and time for a given timezone\n", "{tool_get_current_date[\"function\"]}\n", "If a you choose to call a function ONLY reply in the following format:\n", "<{{start_tag}}={{function_name}}>{{parameters}}{{end_tag}}\n", "where\n", "start_tag => ` a JSON dict with the function argument name as key and function argument value as value.\n", "end_tag => ``\n", "Here is an example,\n", "{{\"example_name\": \"example_value\"}}\n", "Reminder:\n", "- Function calls MUST follow the specified format\n", "- Required parameters MUST be specified\n", "- Only call one function at a time\n", "- Put the entire function call reply on one line\n", "- Always add your sources when using search results to answer the user query\n", "You are a helpful assistant.\"\"\",\n", " },\n", " {\n", " \"role\": \"user\",\n", " \"content\": \"You are in New York. Please get the current date and time, and the weather.\",\n", " },\n", " ]\n", "\n", "\n", "messages = get_messages()\n", "\n", "response = client.chat.completions.create(\n", " model=\"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n", " messages=messages,\n", " response_format={\n", " \"type\": \"structural_tag\",\n", " \"structures\": [\n", " {\n", " \"begin\": \"\",\n", " \"schema\": schema_get_current_weather,\n", " \"end\": \"\",\n", " },\n", " {\n", " \"begin\": \"\",\n", " \"schema\": schema_get_current_date,\n", " \"end\": \"\",\n", " },\n", " ],\n", " \"triggers\": [\"\",\n", " \"schema\": schema_get_current_weather,\n", " \"end\": \"\",\n", " },\n", " {\n", " \"begin\": \"\",\n", " \"schema\": schema_get_current_date,\n", " \"end\": \"\",\n", " },\n", " ],\n", " \"triggers\": [\"\",\n", " \"schema\": schema_get_current_weather,\n", " \"end\": \"\",\n", " },\n", " {\n", " \"begin\": \"\",\n", " \"schema\": schema_get_current_date,\n", " \"end\": \"\",\n", " },\n", " ],\n", " \"triggers\": [\"