rwadurian/backend/services/planting-service/docs/DEPLOYMENT.md

17 KiB

Planting Service 部署文档

目录


部署概述

部署架构

                         ┌─────────────────┐
                         │   Load Balancer │
                         │    (Nginx/ALB)  │
                         └────────┬────────┘
                                  │
              ┌───────────────────┼───────────────────┐
              │                   │                   │
              ▼                   ▼                   ▼
    ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
    │ Planting Service│ │ Planting Service│ │ Planting Service│
    │   Instance 1    │ │   Instance 2    │ │   Instance 3    │
    └────────┬────────┘ └────────┬────────┘ └────────┬────────┘
              │                   │                   │
              └───────────────────┼───────────────────┘
                                  │
                         ┌────────▼────────┐
                         │   PostgreSQL    │
                         │  (Primary/RDS)  │
                         └─────────────────┘

部署方式

方式 适用场景 复杂度
Docker Compose 开发/测试环境
Docker Swarm 小规模生产
Kubernetes 大规模生产
Cloud Run/ECS 托管云服务

环境要求

生产环境最低配置

资源 最低配置 推荐配置
CPU 2 核 4 核
内存 2 GB 4 GB
磁盘 20 GB SSD 50 GB SSD
网络 100 Mbps 1 Gbps

数据库配置

环境 实例类型 存储
开发 db.t3.micro 20 GB
测试 db.t3.small 50 GB
生产 db.r5.large 200 GB

配置管理

环境变量

# 基础配置
NODE_ENV=production
PORT=3003

# 数据库
DATABASE_URL=postgresql://user:password@host:5432/dbname?schema=public

# JWT 认证
JWT_SECRET=<secure-random-string>

# 外部服务
WALLET_SERVICE_URL=http://wallet-service:3002
IDENTITY_SERVICE_URL=http://identity-service:3001
REFERRAL_SERVICE_URL=http://referral-service:3004

# 日志
LOG_LEVEL=info

# 性能
MAX_CONNECTIONS=100
QUERY_TIMEOUT=30000

配置文件示例

.env.production

NODE_ENV=production
PORT=3003
DATABASE_URL=postgresql://planting:${DB_PASSWORD}@db.prod.internal:5432/rwadurian_planting?schema=public&connection_limit=50
JWT_SECRET=${JWT_SECRET}
WALLET_SERVICE_URL=http://wallet-service.prod.internal:3002
IDENTITY_SERVICE_URL=http://identity-service.prod.internal:3001
REFERRAL_SERVICE_URL=http://referral-service.prod.internal:3004
LOG_LEVEL=info

敏感信息管理

使用密钥管理服务:

# AWS Secrets Manager
aws secretsmanager get-secret-value --secret-id planting-service/prod

# Kubernetes Secrets
kubectl create secret generic planting-secrets \
  --from-literal=DATABASE_URL='...' \
  --from-literal=JWT_SECRET='...'

Docker 部署

生产 Dockerfile

# Build stage
FROM node:20-alpine AS builder

WORKDIR /app

# 复制依赖文件
COPY package*.json ./

# 安装依赖
RUN npm ci

# 复制 Prisma schema 并生成客户端
COPY prisma ./prisma/
RUN npx prisma generate

# 复制源代码
COPY . .

# 构建
RUN npm run build

# Production stage
FROM node:20-alpine AS production

WORKDIR /app

# 创建非 root 用户
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nestjs -u 1001 -G nodejs

# 复制依赖文件
COPY package*.json ./

# 仅安装生产依赖
RUN npm ci --only=production

# 复制 Prisma
COPY prisma ./prisma/
RUN npx prisma generate

# 复制构建产物
COPY --from=builder /app/dist ./dist

# 切换到非 root 用户
USER nestjs

# 暴露端口
EXPOSE 3003

# 健康检查
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3003/api/v1/health || exit 1

# 启动命令
CMD ["node", "dist/main"]

Docker Compose 生产配置

# docker-compose.prod.yml
version: '3.8'

services:
  planting-service:
    build:
      context: .
      dockerfile: Dockerfile
      target: production
    image: planting-service:${VERSION:-latest}
    container_name: planting-service
    restart: unless-stopped
    ports:
      - "3003:3003"
    environment:
      - NODE_ENV=production
      - DATABASE_URL=${DATABASE_URL}
      - JWT_SECRET=${JWT_SECRET}
      - WALLET_SERVICE_URL=${WALLET_SERVICE_URL}
      - IDENTITY_SERVICE_URL=${IDENTITY_SERVICE_URL}
      - REFERRAL_SERVICE_URL=${REFERRAL_SERVICE_URL}
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:3003/api/v1/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G
        reservations:
          cpus: '1'
          memory: 1G
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"
    networks:
      - rwadurian-network

networks:
  rwadurian-network:
    external: true

部署命令

# 构建镜像
docker build -t planting-service:v1.0.0 .

# 推送到仓库
docker tag planting-service:v1.0.0 registry.example.com/planting-service:v1.0.0
docker push registry.example.com/planting-service:v1.0.0

# 部署
docker-compose -f docker-compose.prod.yml up -d

# 查看日志
docker-compose -f docker-compose.prod.yml logs -f planting-service

# 重启
docker-compose -f docker-compose.prod.yml restart planting-service

# 停止
docker-compose -f docker-compose.prod.yml down

Kubernetes 部署

Deployment

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: planting-service
  labels:
    app: planting-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: planting-service
  template:
    metadata:
      labels:
        app: planting-service
    spec:
      containers:
        - name: planting-service
          image: registry.example.com/planting-service:v1.0.0
          ports:
            - containerPort: 3003
          env:
            - name: NODE_ENV
              value: production
            - name: PORT
              value: "3003"
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: planting-secrets
                  key: DATABASE_URL
            - name: JWT_SECRET
              valueFrom:
                secretKeyRef:
                  name: planting-secrets
                  key: JWT_SECRET
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
            limits:
              cpu: "2000m"
              memory: "2Gi"
          livenessProbe:
            httpGet:
              path: /api/v1/health
              port: 3003
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /api/v1/health/ready
              port: 3003
            initialDelaySeconds: 5
            periodSeconds: 5
            timeoutSeconds: 3
            failureThreshold: 3
      imagePullSecrets:
        - name: registry-credentials

Service

# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: planting-service
spec:
  selector:
    app: planting-service
  ports:
    - protocol: TCP
      port: 3003
      targetPort: 3003
  type: ClusterIP

Ingress

# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: planting-service-ingress
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
    - hosts:
        - api.rwadurian.com
      secretName: api-tls
  rules:
    - host: api.rwadurian.com
      http:
        paths:
          - path: /api/v1/planting
            pathType: Prefix
            backend:
              service:
                name: planting-service
                port:
                  number: 3003

HPA (Horizontal Pod Autoscaler)

# k8s/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: planting-service-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: planting-service
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

部署命令

# 创建 secrets
kubectl create secret generic planting-secrets \
  --from-literal=DATABASE_URL='postgresql://...' \
  --from-literal=JWT_SECRET='...'

# 应用配置
kubectl apply -f k8s/

# 查看状态
kubectl get pods -l app=planting-service
kubectl get svc planting-service

# 查看日志
kubectl logs -f -l app=planting-service

# 扩容
kubectl scale deployment planting-service --replicas=5

# 回滚
kubectl rollout undo deployment/planting-service

数据库迁移

迁移策略

┌─────────────────────────────────────────────────────────────────┐
│                    数据库迁移流程                                │
├─────────────────────────────────────────────────────────────────┤
│  1. 创建迁移脚本 (开发环境)                                      │
│     npx prisma migrate dev --name add_new_feature               │
│                                                                 │
│  2. 代码审查迁移脚本                                             │
│     检查 prisma/migrations/ 目录                                │
│                                                                 │
│  3. 测试环境验证                                                 │
│     npx prisma migrate deploy                                   │
│                                                                 │
│  4. 生产环境部署                                                 │
│     - 备份数据库                                                 │
│     - 运行迁移 (npx prisma migrate deploy)                      │
│     - 部署新版本应用                                             │
└─────────────────────────────────────────────────────────────────┘

迁移命令

# 开发环境 - 创建迁移
npx prisma migrate dev --name add_new_feature

# 生产环境 - 应用迁移
npx prisma migrate deploy

# 查看迁移状态
npx prisma migrate status

# 重置数据库 (仅开发)
npx prisma migrate reset

迁移最佳实践

  1. 向后兼容: 新版本应用应兼容旧数据库 schema
  2. 分步迁移: 大型变更分多个小迁移执行
  3. 备份优先: 生产迁移前必须备份
  4. 回滚脚本: 准备对应的回滚 SQL

健康检查

端点说明

端点 用途 检查内容
/api/v1/health 存活检查 服务是否运行
/api/v1/health/ready 就绪检查 服务是否可接收请求

健康检查实现

// src/api/controllers/health.controller.ts

@Controller('health')
export class HealthController {
  @Get()
  check() {
    return {
      status: 'ok',
      timestamp: new Date().toISOString(),
      service: 'planting-service',
    };
  }

  @Get('ready')
  async ready() {
    // 可添加数据库连接检查
    return {
      status: 'ready',
      timestamp: new Date().toISOString(),
    };
  }
}

负载均衡器配置

Nginx 配置

upstream planting_service {
    server planting-service-1:3003;
    server planting-service-2:3003;
    server planting-service-3:3003;
}

server {
    listen 80;
    server_name api.rwadurian.com;

    location /api/v1/planting {
        proxy_pass http://planting_service;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;

        # 健康检查
        proxy_connect_timeout 5s;
        proxy_read_timeout 30s;
    }

    location /health {
        proxy_pass http://planting_service/api/v1/health;
        proxy_connect_timeout 5s;
        proxy_read_timeout 5s;
    }
}

监控与日志

日志配置

// src/main.ts
import { Logger } from '@nestjs/common';

async function bootstrap() {
  const app = await NestFactory.create(AppModule, {
    logger: process.env.NODE_ENV === 'production'
      ? ['error', 'warn', 'log']
      : ['error', 'warn', 'log', 'debug', 'verbose'],
  });
  // ...
}

日志格式

{
  "timestamp": "2024-11-30T10:00:00.000Z",
  "level": "info",
  "context": "PlantingApplicationService",
  "message": "Order created",
  "metadata": {
    "orderNo": "PO202411300001",
    "userId": "1",
    "treeCount": 5
  }
}

Prometheus 指标

# prometheus/scrape_configs
- job_name: 'planting-service'
  static_configs:
    - targets: ['planting-service:3003']
  metrics_path: '/metrics'

Grafana 仪表板

关键指标:

  • 请求速率 (requests/second)
  • 响应时间 (p50, p95, p99)
  • 错误率
  • 数据库连接池状态
  • 订单创建数
  • 支付成功率

故障排查

常见问题

1. 服务无法启动

# 检查日志
docker logs planting-service

# 常见原因
# - 数据库连接失败
# - 环境变量缺失
# - 端口冲突

2. 数据库连接失败

# 检查连接
psql $DATABASE_URL -c "SELECT 1"

# 检查 Prisma
npx prisma db pull

3. 内存不足

# 检查内存使用
docker stats planting-service

# 调整 Node.js 内存限制
NODE_OPTIONS="--max-old-space-size=4096" node dist/main

4. 高延迟

# 检查数据库查询
# 启用 Prisma 查询日志

# 检查外部服务响应
curl -w "@curl-format.txt" http://wallet-service:3002/health

调试命令

# 进入容器
docker exec -it planting-service sh

# 检查网络
docker exec planting-service ping db

# 检查环境变量
docker exec planting-service env | grep DATABASE

# 实时日志
docker logs -f --tail 100 planting-service

回滚策略

Docker 回滚

# 停止当前版本
docker-compose -f docker-compose.prod.yml down

# 启动上一版本
docker-compose -f docker-compose.prod.yml up -d --no-build

Kubernetes 回滚

# 查看部署历史
kubectl rollout history deployment/planting-service

# 回滚到上一版本
kubectl rollout undo deployment/planting-service

# 回滚到指定版本
kubectl rollout undo deployment/planting-service --to-revision=2

# 查看回滚状态
kubectl rollout status deployment/planting-service

数据库回滚

-- 准备回滚脚本
-- prisma/rollback/20241130_rollback.sql

-- 示例:回滚列添加
ALTER TABLE "PlantingOrder" DROP COLUMN IF EXISTS "newColumn";

回滚检查清单

  • 确认问题根因
  • 通知相关团队
  • 执行回滚操作
  • 验证服务恢复
  • 检查数据一致性
  • 更新事故报告

部署检查清单

部署前

  • 代码审查通过
  • 所有测试通过
  • 数据库迁移已测试
  • 环境变量已配置
  • 备份已完成

部署中

  • 监控仪表板就绪
  • 日志收集正常
  • 渐进式部署 (金丝雀/蓝绿)
  • 健康检查通过

部署后

  • 功能验证
  • 性能验证
  • 错误率监控
  • 用户反馈收集