# Leaderboard Service 部署文档 ## 1. 部署概述 本文档描述 Leaderboard Service 的部署架构、配置和操作流程。 ### 1.1 部署架构 ``` ┌─────────────────────────────────────────────┐ │ Load Balancer │ │ (Nginx / ALB / etc.) │ └────────────────────┬────────────────────────┘ │ ┌──────────────────────────────┼──────────────────────────────┐ │ │ │ ┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐ │ Service │ │ Service │ │ Service │ │ Instance 1 │ │ Instance 2 │ │ Instance N │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │ │ └──────────────────────────────┼──────────────────────────────┘ │ ┌─────────────────────────────────────────┼─────────────────────────────────────────┐ │ │ │ ┌────▼────┐ ┌─────▼─────┐ ┌─────▼─────┐ │PostgreSQL│ │ Redis │ │ Kafka │ │ Primary │◀──── Replication ────▶ │ Cluster │ │ Cluster │ │ │ │ │ │ │ └─────────┘ └────────────┘ └───────────┘ ``` ### 1.2 部署环境 | 环境 | 用途 | 域名示例 | |------|------|----------| | Development | 本地开发 | localhost:3000 | | Staging | 预发布测试 | staging-leaderboard.example.com | | Production | 生产环境 | leaderboard.example.com | ## 2. Docker 部署 ### 2.1 Dockerfile ```dockerfile # Multi-stage build for production FROM node:20-alpine AS builder WORKDIR /app # Install OpenSSL for Prisma RUN apk add --no-cache openssl # Copy package files COPY package*.json ./ COPY prisma ./prisma/ # Install dependencies RUN npm ci # Generate Prisma client RUN npx prisma generate # Copy source code COPY . . # Build the application RUN npm run build # Production stage FROM node:20-alpine AS production WORKDIR /app # Install OpenSSL for Prisma RUN apk add --no-cache openssl # Copy package files and install production dependencies COPY package*.json ./ RUN npm ci --only=production # Copy Prisma files and generate client COPY prisma ./prisma/ RUN npx prisma generate # Copy built application COPY --from=builder /app/dist ./dist # Create non-root user RUN addgroup -g 1001 -S nodejs RUN adduser -S nestjs -u 1001 USER nestjs # Expose port EXPOSE 3000 # Health check HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \ CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1 # Start the application CMD ["node", "dist/main"] ``` ### 2.2 Docker Compose 生产配置 ```yaml # docker-compose.prod.yml version: '3.8' services: app: build: context: . dockerfile: Dockerfile target: production image: leaderboard-service:${VERSION:-latest} container_name: leaderboard-service restart: unless-stopped ports: - "3000:3000" environment: NODE_ENV: production DATABASE_URL: ${DATABASE_URL} REDIS_HOST: ${REDIS_HOST} REDIS_PORT: ${REDIS_PORT} REDIS_PASSWORD: ${REDIS_PASSWORD} KAFKA_BROKERS: ${KAFKA_BROKERS} JWT_SECRET: ${JWT_SECRET} JWT_EXPIRES_IN: ${JWT_EXPIRES_IN} healthcheck: test: ["CMD", "wget", "--spider", "-q", "http://localhost:3000/health"] interval: 30s timeout: 10s retries: 3 start_period: 40s deploy: resources: limits: cpus: '1' memory: 1G reservations: cpus: '0.5' memory: 512M logging: driver: "json-file" options: max-size: "10m" max-file: "3" networks: - leaderboard-network networks: leaderboard-network: driver: bridge ``` ### 2.3 构建和推送镜像 ```bash # 构建镜像 docker build -t leaderboard-service:1.0.0 . # 标记镜像 docker tag leaderboard-service:1.0.0 registry.example.com/leaderboard-service:1.0.0 docker tag leaderboard-service:1.0.0 registry.example.com/leaderboard-service:latest # 推送到镜像仓库 docker push registry.example.com/leaderboard-service:1.0.0 docker push registry.example.com/leaderboard-service:latest ``` ## 3. Kubernetes 部署 ### 3.1 Deployment ```yaml # k8s/deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: leaderboard-service labels: app: leaderboard-service spec: replicas: 3 selector: matchLabels: app: leaderboard-service template: metadata: labels: app: leaderboard-service spec: containers: - name: leaderboard-service image: registry.example.com/leaderboard-service:1.0.0 ports: - containerPort: 3000 env: - name: NODE_ENV value: "production" - name: DATABASE_URL valueFrom: secretKeyRef: name: leaderboard-secrets key: database-url - name: REDIS_HOST valueFrom: configMapKeyRef: name: leaderboard-config key: redis-host - name: REDIS_PORT valueFrom: configMapKeyRef: name: leaderboard-config key: redis-port - name: JWT_SECRET valueFrom: secretKeyRef: name: leaderboard-secrets key: jwt-secret resources: requests: cpu: "500m" memory: "512Mi" limits: cpu: "1000m" memory: "1Gi" livenessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 30 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3 readinessProbe: httpGet: path: /health/ready port: 3000 initialDelaySeconds: 5 periodSeconds: 5 timeoutSeconds: 3 failureThreshold: 3 affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchLabels: app: leaderboard-service topologyKey: kubernetes.io/hostname ``` ### 3.2 Service ```yaml # k8s/service.yaml apiVersion: v1 kind: Service metadata: name: leaderboard-service spec: type: ClusterIP selector: app: leaderboard-service ports: - port: 80 targetPort: 3000 protocol: TCP ``` ### 3.3 Ingress ```yaml # k8s/ingress.yaml apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: leaderboard-service-ingress annotations: kubernetes.io/ingress.class: nginx cert-manager.io/cluster-issuer: letsencrypt-prod nginx.ingress.kubernetes.io/rate-limit: "100" nginx.ingress.kubernetes.io/rate-limit-window: "1m" spec: tls: - hosts: - leaderboard.example.com secretName: leaderboard-tls rules: - host: leaderboard.example.com http: paths: - path: / pathType: Prefix backend: service: name: leaderboard-service port: number: 80 ``` ### 3.4 ConfigMap ```yaml # k8s/configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: leaderboard-config data: redis-host: "redis-master.redis.svc.cluster.local" redis-port: "6379" kafka-brokers: "kafka-0.kafka.svc.cluster.local:9092,kafka-1.kafka.svc.cluster.local:9092" log-level: "info" ``` ### 3.5 Secrets ```yaml # k8s/secrets.yaml (示例,实际使用需加密) apiVersion: v1 kind: Secret metadata: name: leaderboard-secrets type: Opaque stringData: database-url: "postgresql://user:password@host:5432/leaderboard_db" jwt-secret: "your-production-jwt-secret" redis-password: "your-redis-password" ``` ### 3.6 HPA (Horizontal Pod Autoscaler) ```yaml # k8s/hpa.yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: leaderboard-service-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: leaderboard-service minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 ``` ## 4. 环境配置 ### 4.1 生产环境变量 ```env # 应用配置 NODE_ENV=production PORT=3000 # 数据库配置 DATABASE_URL=postgresql://user:password@db-host:5432/leaderboard_db?connection_limit=20 # Redis 配置 REDIS_HOST=redis-host REDIS_PORT=6379 REDIS_PASSWORD=your-redis-password # Kafka 配置 KAFKA_BROKERS=kafka-1:9092,kafka-2:9092,kafka-3:9092 KAFKA_GROUP_ID=leaderboard-service-group KAFKA_CLIENT_ID=leaderboard-service-prod # JWT 配置 JWT_SECRET=your-production-jwt-secret-at-least-32-chars JWT_EXPIRES_IN=7d # 外部服务 REFERRAL_SERVICE_URL=http://referral-service:3000 IDENTITY_SERVICE_URL=http://identity-service:3000 # 日志配置 LOG_LEVEL=info LOG_FORMAT=json # 性能配置 DISPLAY_LIMIT_DEFAULT=30 REFRESH_INTERVAL_MINUTES=5 CACHE_TTL_SECONDS=300 ``` ### 4.2 数据库迁移 ```bash # 生产环境迁移 DATABASE_URL=$PROD_DATABASE_URL npx prisma migrate deploy # 检查迁移状态 DATABASE_URL=$PROD_DATABASE_URL npx prisma migrate status ``` ## 5. 监控与告警 ### 5.1 健康检查端点 | 端点 | 用途 | 响应 | |------|------|------| | `/health` | 存活检查 | `{"status": "ok"}` | | `/health/ready` | 就绪检查 | `{"status": "ok", "details": {...}}` | ### 5.2 Prometheus 指标 ```yaml # prometheus-servicemonitor.yaml apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: leaderboard-service spec: selector: matchLabels: app: leaderboard-service endpoints: - port: http path: /metrics interval: 30s ``` ### 5.3 告警规则 ```yaml # prometheus-rules.yaml apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: leaderboard-service-alerts spec: groups: - name: leaderboard-service rules: - alert: LeaderboardServiceDown expr: up{job="leaderboard-service"} == 0 for: 1m labels: severity: critical annotations: summary: "Leaderboard Service is down" description: "Leaderboard Service has been down for more than 1 minute." - alert: LeaderboardServiceHighLatency expr: histogram_quantile(0.95, http_request_duration_seconds_bucket{job="leaderboard-service"}) > 2 for: 5m labels: severity: warning annotations: summary: "High latency on Leaderboard Service" description: "95th percentile latency is above 2 seconds." - alert: LeaderboardServiceHighErrorRate expr: rate(http_requests_total{job="leaderboard-service",status=~"5.."}[5m]) > 0.1 for: 5m labels: severity: warning annotations: summary: "High error rate on Leaderboard Service" description: "Error rate is above 10%." ``` ### 5.4 日志收集 ```yaml # fluent-bit-config.yaml apiVersion: v1 kind: ConfigMap metadata: name: fluent-bit-config data: fluent-bit.conf: | [INPUT] Name tail Path /var/log/containers/leaderboard-service*.log Parser docker Tag leaderboard.* Refresh_Interval 5 [OUTPUT] Name es Match leaderboard.* Host elasticsearch Port 9200 Index leaderboard-logs Type _doc ``` ## 6. 运维操作 ### 6.1 常用命令 ```bash # 查看服务状态 kubectl get pods -l app=leaderboard-service # 查看日志 kubectl logs -f deployment/leaderboard-service # 扩缩容 kubectl scale deployment leaderboard-service --replicas=5 # 重启服务 kubectl rollout restart deployment/leaderboard-service # 回滚 kubectl rollout undo deployment/leaderboard-service # 查看资源使用 kubectl top pods -l app=leaderboard-service ``` ### 6.2 数据库维护 ```bash # 数据库备份 pg_dump -h $DB_HOST -U $DB_USER -d leaderboard_db > backup_$(date +%Y%m%d).sql # 数据库恢复 psql -h $DB_HOST -U $DB_USER -d leaderboard_db < backup_20240115.sql # 清理过期数据 psql -h $DB_HOST -U $DB_USER -d leaderboard_db -c " DELETE FROM leaderboard_rankings WHERE period_end_at < NOW() - INTERVAL '90 days'; " ``` ### 6.3 缓存维护 ```bash # 连接 Redis redis-cli -h $REDIS_HOST -p $REDIS_PORT -a $REDIS_PASSWORD # 查看缓存键 KEYS leaderboard:* # 清除特定缓存 DEL leaderboard:DAILY:2024-01-15:rankings # 清除所有排行榜缓存 KEYS leaderboard:* | xargs DEL ``` ## 7. 故障排查 ### 7.1 常见问题 | 问题 | 可能原因 | 解决方案 | |------|----------|----------| | 服务启动失败 | 数据库连接失败 | 检查 DATABASE_URL 配置 | | 排名不更新 | 定时任务未执行 | 检查 Scheduler 日志 | | 响应超时 | 数据库查询慢 | 检查索引和查询计划 | | 缓存失效 | Redis 连接问题 | 检查 Redis 服务状态 | | 消息丢失 | Kafka 配置错误 | 检查 Kafka 连接和主题 | ### 7.2 诊断命令 ```bash # 检查服务连通性 curl -v http://localhost:3000/health # 检查数据库连接 kubectl exec -it deployment/leaderboard-service -- \ npx prisma db execute --stdin <<< "SELECT 1" # 检查 Redis 连接 kubectl exec -it deployment/leaderboard-service -- \ redis-cli -h $REDIS_HOST ping # 查看详细日志 kubectl logs deployment/leaderboard-service --since=1h | grep ERROR ``` ### 7.3 性能诊断 ```bash # CPU Profile kubectl exec -it deployment/leaderboard-service -- \ node --prof dist/main.js # 内存分析 kubectl exec -it deployment/leaderboard-service -- \ node --expose-gc --inspect dist/main.js ``` ## 8. 安全加固 ### 8.1 网络策略 ```yaml # k8s/network-policy.yaml apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: leaderboard-service-network-policy spec: podSelector: matchLabels: app: leaderboard-service policyTypes: - Ingress - Egress ingress: - from: - namespaceSelector: matchLabels: name: ingress-nginx ports: - protocol: TCP port: 3000 egress: - to: - namespaceSelector: matchLabels: name: database ports: - protocol: TCP port: 5432 - to: - namespaceSelector: matchLabels: name: redis ports: - protocol: TCP port: 6379 ``` ### 8.2 安全检查清单 - [ ] 所有敏感信息使用 Secrets 存储 - [ ] 数据库使用强密码和 SSL 连接 - [ ] Redis 启用密码认证 - [ ] JWT Secret 足够长且随机 - [ ] 容器以非 root 用户运行 - [ ] 启用网络策略限制流量 - [ ] 定期更新依赖和基础镜像 - [ ] 启用审计日志 ## 9. 备份与恢复 ### 9.1 备份策略 | 数据类型 | 备份频率 | 保留期限 | |----------|----------|----------| | 数据库 | 每日全量 + 每小时增量 | 30 天 | | 配置 | 每次变更 | 永久(Git) | | 日志 | 实时同步 | 90 天 | ### 9.2 灾难恢复 ```bash # 1. 恢复数据库 pg_restore -h $DB_HOST -U $DB_USER -d leaderboard_db latest_backup.dump # 2. 重新部署服务 kubectl apply -f k8s/ # 3. 验证服务 curl http://leaderboard.example.com/health # 4. 清除并重建缓存 redis-cli FLUSHDB curl -X POST http://leaderboard.example.com/leaderboard/config/refresh ``` ## 10. 版本发布 ### 10.1 发布流程 ``` 1. 开发完成 └── 代码审查 └── 合并到 develop └── CI 测试通过 └── 合并到 main └── 打标签 └── 构建镜像 └── 部署到 Staging └── 验收测试 └── 部署到 Production ``` ### 10.2 蓝绿部署 ```bash # 部署新版本(绿) kubectl apply -f k8s/deployment-green.yaml # 验证新版本 curl http://leaderboard-green.internal/health # 切换流量 kubectl patch service leaderboard-service \ -p '{"spec":{"selector":{"version":"green"}}}' # 验证 curl http://leaderboard.example.com/health # 清理旧版本(蓝) kubectl delete -f k8s/deployment-blue.yaml ``` ### 10.3 金丝雀发布 ```yaml # k8s/canary-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: leaderboard-service-canary spec: replicas: 1 selector: matchLabels: app: leaderboard-service version: canary template: spec: containers: - name: leaderboard-service image: registry.example.com/leaderboard-service:1.1.0-canary ``` ```bash # 逐步增加金丝雀流量 kubectl scale deployment leaderboard-service-canary --replicas=2 kubectl scale deployment leaderboard-service --replicas=8 # 观察指标,无异常则继续 kubectl scale deployment leaderboard-service-canary --replicas=5 kubectl scale deployment leaderboard-service --replicas=5 # 完全切换 kubectl scale deployment leaderboard-service-canary --replicas=10 kubectl scale deployment leaderboard-service --replicas=0 ```