rwadurian/backend/services/reward-service/docs/DEPLOYMENT.md

16 KiB

Reward Service 部署指南

部署概述

本文档描述 Reward Service 的部署架构和操作指南。

部署架构

                    ┌─────────────────┐
                    │   Load Balancer │
                    │   (Nginx/ALB)   │
                    └────────┬────────┘
                             │
            ┌────────────────┼────────────────┐
            │                │                │
     ┌──────▼──────┐  ┌──────▼──────┐  ┌──────▼──────┐
     │ Reward Svc  │  │ Reward Svc  │  │ Reward Svc  │
     │  Instance 1 │  │  Instance 2 │  │  Instance 3 │
     └──────┬──────┘  └──────┬──────┘  └──────┬──────┘
            │                │                │
            └────────────────┼────────────────┘
                             │
     ┌───────────────────────┼───────────────────────┐
     │                       │                       │
┌────▼────┐           ┌──────▼──────┐         ┌──────▼──────┐
│PostgreSQL│           │   Redis    │         │   Kafka    │
│ Primary  │           │  Cluster   │         │  Cluster   │
└────┬────┘           └─────────────┘         └────────────┘
     │
┌────▼────┐
│PostgreSQL│
│ Replica  │
└──────────┘

环境要求

生产环境配置

组件 最低配置 推荐配置
CPU 2 vCPU 4 vCPU
内存 4 GB 8 GB
存储 50 GB SSD 100 GB SSD
Node.js 20.x LTS 20.x LTS

基础设施要求

服务 版本 说明
PostgreSQL 15.x 主数据库
Redis 7.x 缓存和会话
Apache Kafka 3.x 消息队列

Docker 部署

Dockerfile

# 构建阶段
FROM node:20-alpine AS builder

WORKDIR /app

# 复制依赖文件
COPY package*.json ./
COPY prisma ./prisma/

# 安装依赖
RUN npm ci

# 生成 Prisma Client
RUN npx prisma generate

# 复制源代码
COPY . .

# 构建
RUN npm run build

# 生产阶段
FROM node:20-alpine AS production

WORKDIR /app

# 安装生产依赖
COPY package*.json ./
RUN npm ci --only=production

# 复制构建产物
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/prisma ./prisma
COPY --from=builder /app/node_modules/.prisma ./node_modules/.prisma

# 设置环境变量
ENV NODE_ENV=production
ENV PORT=3000

# 健康检查
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1

# 暴露端口
EXPOSE 3000

# 启动命令
CMD ["node", "dist/main.js"]

docker-compose.yml (生产)

version: '3.8'

services:
  reward-service:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=production
      - DATABASE_URL=${DATABASE_URL}
      - REDIS_HOST=${REDIS_HOST}
      - REDIS_PORT=${REDIS_PORT}
      - KAFKA_BROKERS=${KAFKA_BROKERS}
      - JWT_SECRET=${JWT_SECRET}
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
      kafka:
        condition: service_healthy
    deploy:
      replicas: 3
      resources:
        limits:
          cpus: '2'
          memory: 4G
        reservations:
          cpus: '1'
          memory: 2G
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3

  postgres:
    image: postgres:15-alpine
    volumes:
      - postgres-data:/var/lib/postgresql/data
    environment:
      - POSTGRES_USER=${POSTGRES_USER}
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
      - POSTGRES_DB=${POSTGRES_DB}
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
      interval: 10s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    volumes:
      - redis-data:/data
    command: redis-server --appendonly yes
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5

  kafka:
    image: confluentinc/cp-kafka:7.5.0
    depends_on:
      - zookeeper
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
    healthcheck:
      test: ["CMD-SHELL", "kafka-broker-api-versions --bootstrap-server localhost:9092"]
      interval: 10s
      timeout: 10s
      retries: 5

  zookeeper:
    image: confluentinc/cp-zookeeper:7.5.0
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181

volumes:
  postgres-data:
  redis-data:

构建和推送镜像

# 构建镜像
docker build -t reward-service:latest .

# 标记镜像
docker tag reward-service:latest your-registry/reward-service:v1.0.0

# 推送到镜像仓库
docker push your-registry/reward-service:v1.0.0

Kubernetes 部署

Deployment

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: reward-service
  namespace: rwadurian
  labels:
    app: reward-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: reward-service
  template:
    metadata:
      labels:
        app: reward-service
    spec:
      containers:
        - name: reward-service
          image: your-registry/reward-service:v1.0.0
          ports:
            - containerPort: 3000
          env:
            - name: NODE_ENV
              value: "production"
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: reward-service-secrets
                  key: database-url
            - name: REDIS_HOST
              valueFrom:
                configMapKeyRef:
                  name: reward-service-config
                  key: redis-host
            - name: JWT_SECRET
              valueFrom:
                secretKeyRef:
                  name: reward-service-secrets
                  key: jwt-secret
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
            limits:
              cpu: "2000m"
              memory: "4Gi"
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 5

Service

# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: reward-service
  namespace: rwadurian
spec:
  selector:
    app: reward-service
  ports:
    - protocol: TCP
      port: 80
      targetPort: 3000
  type: ClusterIP

Ingress

# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: reward-service-ingress
  namespace: rwadurian
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
    - host: api.rwadurian.com
      http:
        paths:
          - path: /rewards
            pathType: Prefix
            backend:
              service:
                name: reward-service
                port:
                  number: 80

ConfigMap

# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: reward-service-config
  namespace: rwadurian
data:
  redis-host: "redis-master.rwadurian.svc.cluster.local"
  redis-port: "6379"
  kafka-brokers: "kafka-0.kafka.rwadurian.svc.cluster.local:9092"

Secret

# k8s/secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: reward-service-secrets
  namespace: rwadurian
type: Opaque
stringData:
  database-url: "postgresql://user:password@postgres:5432/reward_db"
  jwt-secret: "your-jwt-secret-key"

HorizontalPodAutoscaler

# k8s/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: reward-service-hpa
  namespace: rwadurian
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: reward-service
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

部署命令

# 创建命名空间
kubectl create namespace rwadurian

# 应用配置
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/secret.yaml

# 部署服务
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/ingress.yaml
kubectl apply -f k8s/hpa.yaml

# 查看部署状态
kubectl get pods -n rwadurian
kubectl get services -n rwadurian

# 查看日志
kubectl logs -f deployment/reward-service -n rwadurian

数据库迁移

生产环境迁移

# 1. 备份数据库
pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME > backup_$(date +%Y%m%d).sql

# 2. 运行迁移
DATABASE_URL=$PRODUCTION_DATABASE_URL npx prisma migrate deploy

# 3. 验证迁移
npx prisma db pull --print

回滚策略

# 查看迁移历史
npx prisma migrate status

# 回滚到指定版本 (手动)
psql -h $DB_HOST -U $DB_USER -d $DB_NAME < rollback_migration.sql

环境变量配置

生产环境变量

变量 描述 示例
NODE_ENV 运行环境 production
PORT 服务端口 3000
DATABASE_URL 数据库连接串 postgresql://user:pass@host:5432/db
REDIS_HOST Redis主机 redis-master
REDIS_PORT Redis端口 6379
KAFKA_BROKERS Kafka集群 kafka-0:9092,kafka-1:9092
KAFKA_CLIENT_ID Kafka客户端ID reward-service
KAFKA_GROUP_ID Kafka消费组ID reward-service-group
JWT_SECRET JWT密钥 <strong-secret>
LOG_LEVEL 日志级别 info

监控与告警

健康检查端点

GET /health

响应:

{
  "status": "ok",
  "service": "reward-service",
  "timestamp": "2024-12-01T00:00:00.000Z"
}

Prometheus 指标

添加 @nestjs/terminus 和 Prometheus 指标:

// src/api/controllers/metrics.controller.ts
@Controller('metrics')
export class MetricsController {
  @Get()
  @Header('Content-Type', 'text/plain')
  async getMetrics() {
    return register.metrics();
  }
}

关键指标

指标 描述 告警阈值
http_request_duration_seconds 请求响应时间 P99 > 2s
http_requests_total 请求总数 -
http_request_errors_total 错误请求数 错误率 > 1%
reward_distributed_total 分配的奖励数 -
reward_settled_total 结算的奖励数 -
reward_expired_total 过期的奖励数 -

Grafana 仪表板

关键面板:

  1. 请求吞吐量 (QPS)
  2. 响应时间分布 (P50/P90/P99)
  3. 错误率
  4. 奖励分配/结算/过期趋势
  5. 数据库连接池状态
  6. Redis 缓存命中率

日志管理

日志格式

// 结构化日志输出
{
  "timestamp": "2024-12-01T00:00:00.000Z",
  "level": "info",
  "context": "RewardApplicationService",
  "message": "Distributed 6 rewards for order 123",
  "metadata": {
    "orderId": "123",
    "userId": "100",
    "rewardCount": 6
  }
}

日志级别

级别 用途
error 错误和异常
warn 警告信息
info 业务日志
debug 调试信息 (仅开发环境)

ELK 集成

# filebeat.yml
filebeat.inputs:
  - type: container
    paths:
      - /var/lib/docker/containers/*/*.log
    processors:
      - add_kubernetes_metadata:

output.elasticsearch:
  hosts: ["elasticsearch:9200"]
  indices:
    - index: "reward-service-%{+yyyy.MM.dd}"

CI/CD 流水线

GitHub Actions

# .github/workflows/deploy.yml
name: Deploy to Production

on:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - run: npm ci
      - run: npm run lint
      - run: npm test

  build:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build Docker image
        run: docker build -t reward-service:${{ github.sha }} .
      - name: Push to registry
        run: |
          docker tag reward-service:${{ github.sha }} ${{ secrets.REGISTRY }}/reward-service:${{ github.sha }}
          docker push ${{ secrets.REGISTRY }}/reward-service:${{ github.sha }}          

  deploy:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to Kubernetes
        run: |
          kubectl set image deployment/reward-service \
            reward-service=${{ secrets.REGISTRY }}/reward-service:${{ github.sha }} \
            -n rwadurian          

故障排除

常见问题

1. 服务无法启动

# 检查日志
kubectl logs -f deployment/reward-service -n rwadurian

# 检查环境变量
kubectl exec -it deployment/reward-service -n rwadurian -- env

# 检查数据库连接
kubectl exec -it deployment/reward-service -n rwadurian -- \
  npx prisma db pull

2. 数据库连接问题

# 测试数据库连接
kubectl run -it --rm debug --image=postgres:15-alpine --restart=Never -- \
  psql -h postgres -U user -d reward_db

# 检查网络策略
kubectl get networkpolicy -n rwadurian

3. Kafka 连接问题

# 列出 Kafka topics
kubectl exec -it kafka-0 -- \
  kafka-topics --list --bootstrap-server localhost:9092

# 检查消费者组
kubectl exec -it kafka-0 -- \
  kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group reward-service-group

回滚部署

# 查看历史版本
kubectl rollout history deployment/reward-service -n rwadurian

# 回滚到上一版本
kubectl rollout undo deployment/reward-service -n rwadurian

# 回滚到指定版本
kubectl rollout undo deployment/reward-service -n rwadurian --to-revision=2

安全最佳实践

  1. 密钥管理: 使用 Kubernetes Secrets 或外部密钥管理服务 (Vault)
  2. 网络隔离: 使用 NetworkPolicy 限制 Pod 间通信
  3. 镜像安全: 定期扫描镜像漏洞
  4. 最小权限: 使用非 root 用户运行容器
  5. TLS: 启用服务间 mTLS
  6. 审计日志: 记录所有敏感操作