14 KiB

Raw Blame History

Open-Sora 推理 API 架构与使用说明

一、整体架构

                         ┌─────────────────────────────────────────┐
                         │            训练服务器 89.185.24.182       │
                         │                                          │
  外部请求                │  ┌──────────────────────────────────┐   │
─────────────►  :80      │  │  NGINX（反向代理 + 限流）          │   │
                         │  │  • 5 次/分钟/IP 限流              │   │
                         │  │  • 超时 3600s（适配长时生成）      │   │
                         │  └────────────────┬─────────────────┘   │
                         │                   │ :8000                │
                         │  ┌────────────────▼─────────────────┐   │
                         │  │  Gunicorn + FastAPI（4 workers）  │   │
                         │  │  • POST /v1/generate              │   │
                         │  │  • GET  /v1/jobs/{id}             │   │
                         │  │  • GET  /v1/videos/{id}           │   │
                         │  │  • GET  /health                   │   │
                         │  └────────────────┬─────────────────┘   │
                         │                   │                      │
                         │  ┌────────────────▼─────────────────┐   │
                         │  │  Redis（任务队列 + 结果存储）       │   │
                         │  │  • AOF 持久化（重启不丢任务）       │   │
                         │  │  • 结果保留 24 小时               │   │
                         │  └──┬──┬──┬──┬──┬──┬──┬──┬──────────┘   │
                         │     │  │  │  │  │  │  │  │              │
                         │  ┌──▼──▼──▼──▼──▼──▼──▼──▼──────────┐   │
                         │  │  8× Celery Worker（GPU 0–7）      │   │
                         │  │  • 每 Worker 独占一张 A100 80GB   │   │
                         │  │  • 并发=1，串行处理本 GPU 任务    │   │
                         │  │  • 每 10 任务自动重启（防碎片）   │   │
                         │  └──┬──────────────────────────────┘    │
                         │     │ subprocess                         │
                         │  ┌──▼──────────────────────────────┐    │
                         │  │  torchrun --nproc_per_node=1    │    │
                         │  │  scripts/diffusion/inference.py │    │
                         │  │  • 每次推理完全隔离              │    │
                         │  │  • 崩溃不影响其他 GPU            │    │
                         │  └──┬──────────────────────────────┘    │
                         │     │                                    │
                         │  ┌──▼──────────────────────────────┐    │
                         │  │  /data/train-output/api-outputs/│    │
                         │  │  视频文件存储（写盘 Huawei NVMe） │    │
                         │  └─────────────────────────────────┘    │
                         └─────────────────────────────────────────┘

关键设计决策

决策	原因
subprocess 调用 torchrun	进程级隔离，单卡崩溃不影响整体服务
每 Worker 独占一 GPU	避免显存竞争，最大化并发吞吐（8 并发）
Celery `task_acks_late=True`	Worker 崩溃时任务重新入队，不丢失
Redis AOF 持久化	服务器重启后任务状态不丢失
systemd 守护所有进程	任何服务崩溃 5 秒内自动拉起
NGINX 限流 5r/m	防止单个调用方打满 GPU 队列

二、服务组件

文件结构

my-sora/
├── api/
│   ├── config.py        # 路径、GPU数量等配置
│   ├── schemas.py       # 请求/响应 Pydantic 模型
│   ├── tasks.py         # Celery 任务（核心推理逻辑）
│   └── main.py          # FastAPI 路由
├── deploy/
│   ├── nginx.conf                # NGINX 站点配置
│   ├── opensora-limit.conf       # 限流区定义（http 上下文）
│   ├── opensora-api.service      # FastAPI systemd 单元
│   ├── opensora-worker@.service  # Worker systemd 模板（GPU 0-7）
│   └── setup.sh                  # 一键部署脚本
└── requirements-api.txt

进程清单

进程	数量	systemd 单元
NGINX	1	nginx.service
Redis	1	redis-server.service
Gunicorn（FastAPI）	1×4 workers	opensora-api.service
Celery Worker	8（GPU 0-7）	opensora-worker@{0-7}.service

三、API 接口

基础信息

Base URL：http://89.185.24.182
认证：无（内网使用）
Content-Type：application/json
编码：UTF-8

3.1 健康检查

GET /health

响应示例：

{
  "status": "ok",
  "time": "2026-03-06T12:00:00.000000+00:00"
}

3.2 提交生成任务

POST /v1/generate

请求参数：

字段	类型	必填	默认值	说明
`prompt`	string	✅	—	文本提示词，1–2000 字符
`resolution`	string		`"256px"`	分辨率：`"256px"` 或 `"768px"`
`aspect_ratio`	string		`"16:9"`	比例：`"16:9"` `"9:16"` `"1:1"` `"2.39:1"`
`num_frames`	int		`49`	帧数，建议 `49`（≈2s）`97`（≈4s）`129`（≈5s）
`motion_score`	int		`4`	运动幅度 1–7，1=几乎静止，7=剧烈运动
`num_steps`	int		`50`	扩散步数 10–100，越大质量越高越慢
`seed`	int		随机	随机种子，固定可复现
`cond_type`	string		`"t2v"`	`"t2v"`（文生视频）或 `"i2v_head"`（图生视频）

请求示例：

curl -X POST http://89.185.24.182/v1/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a golden retriever running on the beach at sunset, slow motion",
    "resolution": "256px",
    "aspect_ratio": "16:9",
    "num_frames": 49,
    "motion_score": 5,
    "num_steps": 50
  }'

响应示例（HTTP 202）：

{
  "job_id": "3b7e2a1c-4f8d-4b9e-a12c-8d7f3e6b2c1a",
  "message": "任务已提交"
}

3.3 查询任务状态

GET /v1/jobs/{job_id}

响应字段：

字段	说明
`job_id`	任务 ID
`status`	`pending`（排队）/ `processing`（生成中）/ `completed`（完成）/ `failed`（失败）
`video_url`	完成后的视频下载路径（仅 `completed` 状态有）
`error`	失败原因（仅 `failed` 状态有）
`completed_at`	完成时间（ISO 8601）

轮询示例：

curl http://89.185.24.182/v1/jobs/3b7e2a1c-4f8d-4b9e-a12c-8d7f3e6b2c1a

生成中响应：

{
  "job_id": "3b7e2a1c-4f8d-4b9e-a12c-8d7f3e6b2c1a",
  "status": "processing",
  "video_url": null,
  "error": null,
  "completed_at": null
}

完成响应：

{
  "job_id": "3b7e2a1c-4f8d-4b9e-a12c-8d7f3e6b2c1a",
  "status": "completed",
  "video_url": "/v1/videos/3b7e2a1c-4f8d-4b9e-a12c-8d7f3e6b2c1a",
  "error": null,
  "completed_at": "2026-03-06T12:01:05.123456+00:00"
}

3.4 下载视频

GET /v1/videos/{job_id}

返回 video/mp4 文件流，文件名为 {job_id}.mp4。

curl -O http://89.185.24.182/v1/videos/3b7e2a1c-4f8d-4b9e-a12c-8d7f3e6b2c1a

四、使用示例

Python 客户端（完整轮询流程）

import time
import requests

BASE_URL = "http://89.185.24.182"

def generate_video(prompt: str, **kwargs) -> str:
    """提交任务并等待完成，返回本地视频路径。"""
    # 1. 提交任务
    resp = requests.post(f"{BASE_URL}/v1/generate", json={"prompt": prompt, **kwargs})
    resp.raise_for_status()
    job_id = resp.json()["job_id"]
    print(f"任务已提交: {job_id}")

    # 2. 轮询等待（每 10 秒查一次）
    while True:
        status_resp = requests.get(f"{BASE_URL}/v1/jobs/{job_id}")
        status_resp.raise_for_status()
        data = status_resp.json()
        status = data["status"]
        print(f"状态: {status}")

        if status == "completed":
            break
        if status == "failed":
            raise RuntimeError(f"生成失败: {data['error']}")

        time.sleep(10)

    # 3. 下载视频
    video_resp = requests.get(f"{BASE_URL}/v1/videos/{job_id}", stream=True)
    video_resp.raise_for_status()
    local_path = f"{job_id}.mp4"
    with open(local_path, "wb") as f:
        for chunk in video_resp.iter_content(chunk_size=8192):
            f.write(chunk)
    print(f"视频已保存: {local_path}")
    return local_path


if __name__ == "__main__":
    generate_video(
        prompt="a panda eating bamboo in a misty forest, cinematic",
        resolution="256px",
        num_frames=49,
        motion_score=4,
    )

Shell 一行命令（快速测试）

# 提交 → 拿 job_id → 等 90 秒 → 下载
JOB=$(curl -s -X POST http://89.185.24.182/v1/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt":"ocean waves at sunset","num_frames":49}' \
  | python3 -c "import sys,json; print(json.load(sys.stdin)['job_id'])")
echo "job_id: $JOB"
sleep 90
curl -O http://89.185.24.182/v1/videos/$JOB

五、性能参考

基于官方 H100 基准数据，A100 80GB 性能略低，实际参考：

分辨率	帧数	GPU 数	预计耗时	显存峰值
256px	49	1	~60s	~52GB
256px	129	1	~90s	~55GB
768px	49	1	~900s	~62GB
768px	129	8	~350s	~44GB/卡

最大并发：8 个 256px 任务可同时执行（每卡一个）。768px 高质量视频建议串行或减少并发。

六、测试流程

6.1 环境就绪检查

# 服务全部 active
ssh ceshi@89.185.24.182 "sudo systemctl is-active opensora-api opensora-worker@{0..7} nginx redis-server"

# API 健康检查
curl http://89.185.24.182/health

# Redis 连通
ssh ceshi@89.185.24.182 "redis-cli ping"

# 权重文件确认
ssh ceshi@89.185.24.182 "ls -lh /data/train-input/ckpts/*.safetensors"

6.2 冒烟测试（首次验证）

# 提交一个最简单的任务
curl -X POST http://89.185.24.182/v1/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "a red ball bouncing", "num_frames": 17, "num_steps": 20}'

查询直到 completed，预计 30–60 秒。

6.3 并发测试（8 GPU 全满）

# 同时提交 8 个任务，验证 8 卡并发
for i in $(seq 1 8); do
  curl -s -X POST http://89.185.24.182/v1/generate \
    -H "Content-Type: application/json" \
    -d "{\"prompt\": \"test video $i, nature scene\", \"num_frames\": 17, \"num_steps\": 10}" &
done
wait
echo "8 个任务已提交"

6.4 故障恢复测试

# 模拟 API 服务崩溃后自动恢复
sudo systemctl kill opensora-api
sleep 8
curl http://89.185.24.182/health   # 应在 5-10 秒内恢复返回 200

# 模拟某个 Worker 崩溃后恢复
sudo systemctl kill opensora-worker@3
sleep 15
sudo systemctl is-active opensora-worker@3   # 应为 active

七、运维操作

查看日志

# API 服务实时日志
sudo journalctl -u opensora-api -f

# GPU 0 Worker 日志
tail -f /data/train-output/logs/worker-gpu0.log

# NGINX 访问日志
tail -f /data/train-output/logs/api-access.log

重启服务

# 重启 API
sudo systemctl restart opensora-api

# 重启单个 Worker
sudo systemctl restart opensora-worker@2

# 重启全部 Worker
for i in $(seq 0 7); do sudo systemctl restart opensora-worker@$i; done

# 重启全栈
sudo systemctl restart opensora-api redis-server nginx
for i in $(seq 0 7); do sudo systemctl restart opensora-worker@$i; done

磁盘清理

视频文件存储在 /data/train-output/api-outputs/，每个任务占约 50–300MB。

# 查看占用
du -sh /data/train-output/api-outputs/

# 删除 7 天前的视频
find /data/train-output/api-outputs/ -maxdepth 1 -type d -mtime +7 -exec rm -rf {} +

查看 GPU 使用

# 实时监控各 GPU 显存
watch -n 2 nvidia-smi --query-gpu=index,name,memory.used,memory.free,utilization.gpu --format=csv

八、常见问题

现象	原因	解决
任务长时间 `pending`	Worker 全部在忙	等待或减少并发请求
任务 `failed`，报 OOM	768px 显存不足	改用 256px 或减少 num_frames
`/health` 返回 502	API 服务未启动	`sudo systemctl restart opensora-api`
任务 `failed`，报权重找不到	权重未下载完成	检查 `/data/train-input/ckpts/` 目录
Worker 一直重启	模型加载失败	`tail /data/train-output/logs/worker-gpu0.log`

14 KiB Raw Blame History Unescape Escape