# OpenAI APIs - Vision

SGLang provides OpenAI-compatible APIs to enable a smooth transition from OpenAI services to self-hosted local models.
A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/guides/vision).
This tutorial covers the vision APIs for vision language models.

SGLang supports various vision language models such as Llama 3.2, LLaVA-OneVision, Qwen2.5-VL, Gemma3 and [more](https://docs.sglang.ai/references/supported_models): 
- [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) 
- [lmms-lab/llava-onevision-qwen2-72b-ov-chat](https://huggingface.co/lmms-lab/llava-onevision-qwen2-72b-ov-chat) 
- [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)
- [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it)
- [openbmb/MiniCPM-V](https://huggingface.co/openbmb/MiniCPM-V)
- [deepseek-ai/deepseek-vl2](https://huggingface.co/deepseek-ai/deepseek-vl2)

As an alternative to the OpenAI API, you can also use the [SGLang offline engine](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py).

## Launch A Server

Launch the server in your terminal and wait for it to initialize.

**Remember to add** `--chat-template llama_3_vision` **to specify the [vision chat template](https://docs.sglang.ai/backend/openai_api_vision.html#Chat-Template), otherwise, the server will only support text (images won’t be passed in), which can lead to degraded performance.**

We need to specify `--chat-template` for vision language models because the chat template provided in Hugging Face tokenizer only supports text.

In [None]:
from sglang.test.test_utils import is_in_ci

if is_in_ci():
 from patch import launch_server_cmd
else:
 from sglang.utils import launch_server_cmd

from sglang.utils import wait_for_server, print_highlight, terminate_process

vision_process, port = launch_server_cmd(
 """
python3 -m sglang.launch_server --model-path meta-llama/Llama-3.2-11B-Vision-Instruct \
 --chat-template=llama_3_vision
"""
)

wait_for_server(f"http://localhost:{port}")

## Using cURL

Once the server is up, you can send test requests using curl or requests.

In [None]:
import subprocess

curl_command = f"""
curl -s http://localhost:{port}/v1/chat/completions \\
 -d '{{
 "model": "meta-llama/Llama-3.2-11B-Vision-Instruct",
 "messages": [
 {{
 "role": "user",
 "content": [
 {{
 "type": "text",
 "text": "What’s in this image?"
 }},
 {{
 "type": "image_url",
 "image_url": {{
 "url": "https://github.com/sgl-project/sglang/blob/main/test/lang/example_image.png?raw=true"
 }}
 }}
 ]
 }}
 ],
 "max_tokens": 300
 }}'
"""

response = subprocess.check_output(curl_command, shell=True).decode()
print_highlight(response)


response = subprocess.check_output(curl_command, shell=True).decode()
print_highlight(response)

## Using Python Requests

In [None]:
import requests

url = f"http://localhost:{port}/v1/chat/completions"

data = {
 "model": "meta-llama/Llama-3.2-11B-Vision-Instruct",
 "messages": [
 {
 "role": "user",
 "content": [
 {"type": "text", "text": "What’s in this image?"},
 {
 "type": "image_url",
 "image_url": {
 "url": "https://github.com/sgl-project/sglang/blob/main/test/lang/example_image.png?raw=true"
 },
 },
 ],
 }
 ],
 "max_tokens": 300,
}

response = requests.post(url, json=data)
print_highlight(response.text)

## Using OpenAI Python Client

In [None]:
from openai import OpenAI

client = OpenAI(base_url=f"http://localhost:{port}/v1", api_key="None")

response = client.chat.completions.create(
 model="meta-llama/Llama-3.2-11B-Vision-Instruct",
 messages=[
 {
 "role": "user",
 "content": [
 {
 "type": "text",
 "text": "What is in this image?",
 },
 {
 "type": "image_url",
 "image_url": {
 "url": "https://github.com/sgl-project/sglang/blob/main/test/lang/example_image.png?raw=true"
 },
 },
 ],
 }
 ],
 max_tokens=300,
)

print_highlight(response.choices[0].message.content)

## Multiple-Image Inputs

The server also supports multiple images and interleaved text and images if the model supports it.

In [None]:
from openai import OpenAI

client = OpenAI(base_url=f"http://localhost:{port}/v1", api_key="None")

response = client.chat.completions.create(
 model="meta-llama/Llama-3.2-11B-Vision-Instruct",
 messages=[
 {
 "role": "user",
 "content": [
 {
 "type": "image_url",
 "image_url": {
 "url": "https://github.com/sgl-project/sglang/blob/main/test/lang/example_image.png?raw=true",
 },
 },
 {
 "type": "image_url",
 "image_url": {
 "url": "https://raw.githubusercontent.com/sgl-project/sglang/main/assets/logo.png",
 },
 },
 {
 "type": "text",
 "text": "I have two very different images. They are not related at all. "
 "Please describe the first image in one sentence, and then describe the second image in another sentence.",
 },
 ],
 }
 ],
 temperature=0,
)

print_highlight(response.choices[0].message.content)

In [None]:
terminate_process(vision_process)

## Chat Template

As mentioned before, if you do not specify a vision model's `--chat-template`, the server uses Hugging Face's default template, which only supports text.

We list popular vision models with their chat templates:

- [meta-llama/Llama-3.2-Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) uses `llama_3_vision`.
- [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) uses `qwen2-vl`.
- [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) uses `gemma-it`.
- [openbmb/MiniCPM-V](https://huggingface.co/openbmb/MiniCPM-V) uses `minicpmv`.
- [deepseek-ai/deepseek-vl2](https://huggingface.co/deepseek-ai/deepseek-vl2) uses `deepseek-vl2`.
- [LlaVA-OneVision](https://huggingface.co/lmms-lab/llava-onevision-qwen2-7b-ov) uses `chatml-llava`.
- [LLaVA-NeXT](https://huggingface.co/collections/lmms-lab/llava-next-6623288e2d61edba3ddbf5ff) uses `chatml-llava`.
- [Llama3-LLaVA-NeXT](https://huggingface.co/lmms-lab/llama3-llava-next-8b) uses `llava_llama_3`.
- [LLaVA-v1.5 / 1.6](https://huggingface.co/liuhaotian/llava-v1.6-34b) uses `vicuna_v1.1`.