16 KiB
16 KiB
Reward Service 部署指南
部署概述
本文档描述 Reward Service 的部署架构和操作指南。
部署架构
┌─────────────────┐
│ Load Balancer │
│ (Nginx/ALB) │
└────────┬────────┘
│
┌────────────────┼────────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ Reward Svc │ │ Reward Svc │ │ Reward Svc │
│ Instance 1 │ │ Instance 2 │ │ Instance 3 │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└────────────────┼────────────────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
┌────▼────┐ ┌──────▼──────┐ ┌──────▼──────┐
│PostgreSQL│ │ Redis │ │ Kafka │
│ Primary │ │ Cluster │ │ Cluster │
└────┬────┘ └─────────────┘ └────────────┘
│
┌────▼────┐
│PostgreSQL│
│ Replica │
└──────────┘
环境要求
生产环境配置
| 组件 | 最低配置 | 推荐配置 |
|---|---|---|
| CPU | 2 vCPU | 4 vCPU |
| 内存 | 4 GB | 8 GB |
| 存储 | 50 GB SSD | 100 GB SSD |
| Node.js | 20.x LTS | 20.x LTS |
基础设施要求
| 服务 | 版本 | 说明 |
|---|---|---|
| PostgreSQL | 15.x | 主数据库 |
| Redis | 7.x | 缓存和会话 |
| Apache Kafka | 3.x | 消息队列 |
Docker 部署
Dockerfile
# 构建阶段
FROM node:20-alpine AS builder
WORKDIR /app
# 复制依赖文件
COPY package*.json ./
COPY prisma ./prisma/
# 安装依赖
RUN npm ci
# 生成 Prisma Client
RUN npx prisma generate
# 复制源代码
COPY . .
# 构建
RUN npm run build
# 生产阶段
FROM node:20-alpine AS production
WORKDIR /app
# 安装生产依赖
COPY package*.json ./
RUN npm ci --only=production
# 复制构建产物
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/prisma ./prisma
COPY --from=builder /app/node_modules/.prisma ./node_modules/.prisma
# 设置环境变量
ENV NODE_ENV=production
ENV PORT=3000
# 健康检查
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
# 暴露端口
EXPOSE 3000
# 启动命令
CMD ["node", "dist/main.js"]
docker-compose.yml (生产)
version: '3.8'
services:
reward-service:
build:
context: .
dockerfile: Dockerfile
ports:
- "3000:3000"
environment:
- NODE_ENV=production
- DATABASE_URL=${DATABASE_URL}
- REDIS_HOST=${REDIS_HOST}
- REDIS_PORT=${REDIS_PORT}
- KAFKA_BROKERS=${KAFKA_BROKERS}
- JWT_SECRET=${JWT_SECRET}
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
kafka:
condition: service_healthy
deploy:
replicas: 3
resources:
limits:
cpus: '2'
memory: 4G
reservations:
cpus: '1'
memory: 2G
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
postgres:
image: postgres:15-alpine
volumes:
- postgres-data:/var/lib/postgresql/data
environment:
- POSTGRES_USER=${POSTGRES_USER}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- POSTGRES_DB=${POSTGRES_DB}
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
interval: 10s
timeout: 5s
retries: 5
redis:
image: redis:7-alpine
volumes:
- redis-data:/data
command: redis-server --appendonly yes
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
kafka:
image: confluentinc/cp-kafka:7.5.0
depends_on:
- zookeeper
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
healthcheck:
test: ["CMD-SHELL", "kafka-broker-api-versions --bootstrap-server localhost:9092"]
interval: 10s
timeout: 10s
retries: 5
zookeeper:
image: confluentinc/cp-zookeeper:7.5.0
environment:
ZOOKEEPER_CLIENT_PORT: 2181
volumes:
postgres-data:
redis-data:
构建和推送镜像
# 构建镜像
docker build -t reward-service:latest .
# 标记镜像
docker tag reward-service:latest your-registry/reward-service:v1.0.0
# 推送到镜像仓库
docker push your-registry/reward-service:v1.0.0
Kubernetes 部署
Deployment
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: reward-service
namespace: rwadurian
labels:
app: reward-service
spec:
replicas: 3
selector:
matchLabels:
app: reward-service
template:
metadata:
labels:
app: reward-service
spec:
containers:
- name: reward-service
image: your-registry/reward-service:v1.0.0
ports:
- containerPort: 3000
env:
- name: NODE_ENV
value: "production"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: reward-service-secrets
key: database-url
- name: REDIS_HOST
valueFrom:
configMapKeyRef:
name: reward-service-config
key: redis-host
- name: JWT_SECRET
valueFrom:
secretKeyRef:
name: reward-service-secrets
key: jwt-secret
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2000m"
memory: "4Gi"
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
Service
# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
name: reward-service
namespace: rwadurian
spec:
selector:
app: reward-service
ports:
- protocol: TCP
port: 80
targetPort: 3000
type: ClusterIP
Ingress
# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: reward-service-ingress
namespace: rwadurian
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: api.rwadurian.com
http:
paths:
- path: /rewards
pathType: Prefix
backend:
service:
name: reward-service
port:
number: 80
ConfigMap
# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: reward-service-config
namespace: rwadurian
data:
redis-host: "redis-master.rwadurian.svc.cluster.local"
redis-port: "6379"
kafka-brokers: "kafka-0.kafka.rwadurian.svc.cluster.local:9092"
Secret
# k8s/secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: reward-service-secrets
namespace: rwadurian
type: Opaque
stringData:
database-url: "postgresql://user:password@postgres:5432/reward_db"
jwt-secret: "your-jwt-secret-key"
HorizontalPodAutoscaler
# k8s/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: reward-service-hpa
namespace: rwadurian
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: reward-service
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
部署命令
# 创建命名空间
kubectl create namespace rwadurian
# 应用配置
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/secret.yaml
# 部署服务
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/ingress.yaml
kubectl apply -f k8s/hpa.yaml
# 查看部署状态
kubectl get pods -n rwadurian
kubectl get services -n rwadurian
# 查看日志
kubectl logs -f deployment/reward-service -n rwadurian
数据库迁移
生产环境迁移
# 1. 备份数据库
pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME > backup_$(date +%Y%m%d).sql
# 2. 运行迁移
DATABASE_URL=$PRODUCTION_DATABASE_URL npx prisma migrate deploy
# 3. 验证迁移
npx prisma db pull --print
回滚策略
# 查看迁移历史
npx prisma migrate status
# 回滚到指定版本 (手动)
psql -h $DB_HOST -U $DB_USER -d $DB_NAME < rollback_migration.sql
环境变量配置
生产环境变量
| 变量 | 描述 | 示例 |
|---|---|---|
NODE_ENV |
运行环境 | production |
PORT |
服务端口 | 3000 |
DATABASE_URL |
数据库连接串 | postgresql://user:pass@host:5432/db |
REDIS_HOST |
Redis主机 | redis-master |
REDIS_PORT |
Redis端口 | 6379 |
KAFKA_BROKERS |
Kafka集群 | kafka-0:9092,kafka-1:9092 |
KAFKA_CLIENT_ID |
Kafka客户端ID | reward-service |
KAFKA_GROUP_ID |
Kafka消费组ID | reward-service-group |
JWT_SECRET |
JWT密钥 | <strong-secret> |
LOG_LEVEL |
日志级别 | info |
监控与告警
健康检查端点
GET /health
响应:
{
"status": "ok",
"service": "reward-service",
"timestamp": "2024-12-01T00:00:00.000Z"
}
Prometheus 指标
添加 @nestjs/terminus 和 Prometheus 指标:
// src/api/controllers/metrics.controller.ts
@Controller('metrics')
export class MetricsController {
@Get()
@Header('Content-Type', 'text/plain')
async getMetrics() {
return register.metrics();
}
}
关键指标
| 指标 | 描述 | 告警阈值 |
|---|---|---|
http_request_duration_seconds |
请求响应时间 | P99 > 2s |
http_requests_total |
请求总数 | - |
http_request_errors_total |
错误请求数 | 错误率 > 1% |
reward_distributed_total |
分配的奖励数 | - |
reward_settled_total |
结算的奖励数 | - |
reward_expired_total |
过期的奖励数 | - |
Grafana 仪表板
关键面板:
- 请求吞吐量 (QPS)
- 响应时间分布 (P50/P90/P99)
- 错误率
- 奖励分配/结算/过期趋势
- 数据库连接池状态
- Redis 缓存命中率
日志管理
日志格式
// 结构化日志输出
{
"timestamp": "2024-12-01T00:00:00.000Z",
"level": "info",
"context": "RewardApplicationService",
"message": "Distributed 6 rewards for order 123",
"metadata": {
"orderId": "123",
"userId": "100",
"rewardCount": 6
}
}
日志级别
| 级别 | 用途 |
|---|---|
error |
错误和异常 |
warn |
警告信息 |
info |
业务日志 |
debug |
调试信息 (仅开发环境) |
ELK 集成
# filebeat.yml
filebeat.inputs:
- type: container
paths:
- /var/lib/docker/containers/*/*.log
processors:
- add_kubernetes_metadata:
output.elasticsearch:
hosts: ["elasticsearch:9200"]
indices:
- index: "reward-service-%{+yyyy.MM.dd}"
CI/CD 流水线
GitHub Actions
# .github/workflows/deploy.yml
name: Deploy to Production
on:
push:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm ci
- run: npm run lint
- run: npm test
build:
needs: test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build Docker image
run: docker build -t reward-service:${{ github.sha }} .
- name: Push to registry
run: |
docker tag reward-service:${{ github.sha }} ${{ secrets.REGISTRY }}/reward-service:${{ github.sha }}
docker push ${{ secrets.REGISTRY }}/reward-service:${{ github.sha }}
deploy:
needs: build
runs-on: ubuntu-latest
steps:
- name: Deploy to Kubernetes
run: |
kubectl set image deployment/reward-service \
reward-service=${{ secrets.REGISTRY }}/reward-service:${{ github.sha }} \
-n rwadurian
故障排除
常见问题
1. 服务无法启动
# 检查日志
kubectl logs -f deployment/reward-service -n rwadurian
# 检查环境变量
kubectl exec -it deployment/reward-service -n rwadurian -- env
# 检查数据库连接
kubectl exec -it deployment/reward-service -n rwadurian -- \
npx prisma db pull
2. 数据库连接问题
# 测试数据库连接
kubectl run -it --rm debug --image=postgres:15-alpine --restart=Never -- \
psql -h postgres -U user -d reward_db
# 检查网络策略
kubectl get networkpolicy -n rwadurian
3. Kafka 连接问题
# 列出 Kafka topics
kubectl exec -it kafka-0 -- \
kafka-topics --list --bootstrap-server localhost:9092
# 检查消费者组
kubectl exec -it kafka-0 -- \
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group reward-service-group
回滚部署
# 查看历史版本
kubectl rollout history deployment/reward-service -n rwadurian
# 回滚到上一版本
kubectl rollout undo deployment/reward-service -n rwadurian
# 回滚到指定版本
kubectl rollout undo deployment/reward-service -n rwadurian --to-revision=2
安全最佳实践
- 密钥管理: 使用 Kubernetes Secrets 或外部密钥管理服务 (Vault)
- 网络隔离: 使用 NetworkPolicy 限制 Pod 间通信
- 镜像安全: 定期扫描镜像漏洞
- 最小权限: 使用非 root 用户运行容器
- TLS: 启用服务间 mTLS
- 审计日志: 记录所有敏感操作