{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# OpenAI APIs - Embedding\n", "\n", "SGLang provides OpenAI-compatible APIs to enable a smooth transition from OpenAI services to self-hosted local models.\n", "A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/guides/embeddings).\n", "\n", "This tutorial covers the embedding APIs for embedding models. For a list of the supported models see the [corresponding overview page](https://docs.sglang.ai/supported_models/embedding_models.html)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Launch A Server\n", "\n", "Launch the server in your terminal and wait for it to initialize. Remember to add `--is-embedding` to the command." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sglang.test.test_utils import is_in_ci\n", "\n", "if is_in_ci():\n", " from patch import launch_server_cmd\n", "else:\n", " from sglang.utils import launch_server_cmd\n", "\n", "from sglang.utils import wait_for_server, print_highlight, terminate_process\n", "\n", "embedding_process, port = launch_server_cmd(\n", " \"\"\"\n", "python3 -m sglang.launch_server --model-path Alibaba-NLP/gte-Qwen2-1.5B-instruct \\\n", " --host 0.0.0.0 --is-embedding\n", "\"\"\"\n", ")\n", "\n", "wait_for_server(f\"http://localhost:{port}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using cURL" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import subprocess, json\n", "\n", "text = \"Once upon a time\"\n", "\n", "curl_text = f\"\"\"curl -s http://localhost:{port}/v1/embeddings \\\n", " -H \"Content-Type: application/json\" \\\n", " -d '{{\"model\": \"Alibaba-NLP/gte-Qwen2-1.5B-instruct\", \"input\": \"{text}\"}}'\"\"\"\n", "\n", "result = subprocess.check_output(curl_text, shell=True)\n", "\n", "print(result)\n", "\n", "text_embedding = json.loads(result)[\"data\"][0][\"embedding\"]\n", "\n", "print_highlight(f\"Text embedding (first 10): {text_embedding[:10]}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using Python Requests" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import requests\n", "\n", "text = \"Once upon a time\"\n", "\n", "response = requests.post(\n", " f\"http://localhost:{port}/v1/embeddings\",\n", " json={\"model\": \"Alibaba-NLP/gte-Qwen2-1.5B-instruct\", \"input\": text},\n", ")\n", "\n", "text_embedding = response.json()[\"data\"][0][\"embedding\"]\n", "\n", "print_highlight(f\"Text embedding (first 10): {text_embedding[:10]}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using OpenAI Python Client" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import openai\n", "\n", "client = openai.Client(base_url=f\"http://127.0.0.1:{port}/v1\", api_key=\"None\")\n", "\n", "# Text embedding example\n", "response = client.embeddings.create(\n", " model=\"Alibaba-NLP/gte-Qwen2-1.5B-instruct\",\n", " input=text,\n", ")\n", "\n", "embedding = response.data[0].embedding[:10]\n", "print_highlight(f\"Text embedding (first 10): {embedding}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using Input IDs\n", "\n", "SGLang also supports `input_ids` as input to get the embedding." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import json\n", "import os\n", "from transformers import AutoTokenizer\n", "\n", "os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n", "\n", "tokenizer = AutoTokenizer.from_pretrained(\"Alibaba-NLP/gte-Qwen2-1.5B-instruct\")\n", "input_ids = tokenizer.encode(text)\n", "\n", "curl_ids = f\"\"\"curl -s http://localhost:{port}/v1/embeddings \\\n", " -H \"Content-Type: application/json\" \\\n", " -d '{{\"model\": \"Alibaba-NLP/gte-Qwen2-1.5B-instruct\", \"input\": {json.dumps(input_ids)}}}'\"\"\"\n", "\n", "input_ids_embedding = json.loads(subprocess.check_output(curl_ids, shell=True))[\"data\"][\n", " 0\n", "][\"embedding\"]\n", "\n", "print_highlight(f\"Input IDs embedding (first 10): {input_ids_embedding[:10]}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "terminate_process(embedding_process)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Multi-Modal Embedding Model\n", "Please refer to [Multi-Modal Embedding Model](../supported_models/embedding_models.md)" ] } ], "metadata": { "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 2 }