rwadurian/backend/services/reward-service/docs/DEPLOYMENT.md

680 lines
16 KiB
Markdown

# Reward Service 部署指南
## 部署概述
本文档描述 Reward Service 的部署架构和操作指南。
### 部署架构
```
┌─────────────────┐
│ Load Balancer │
│ (Nginx/ALB) │
└────────┬────────┘
┌────────────────┼────────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ Reward Svc │ │ Reward Svc │ │ Reward Svc │
│ Instance 1 │ │ Instance 2 │ │ Instance 3 │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└────────────────┼────────────────┘
┌───────────────────────┼───────────────────────┐
│ │ │
┌────▼────┐ ┌──────▼──────┐ ┌──────▼──────┐
│PostgreSQL│ │ Redis │ │ Kafka │
│ Primary │ │ Cluster │ │ Cluster │
└────┬────┘ └─────────────┘ └────────────┘
┌────▼────┐
│PostgreSQL│
│ Replica │
└──────────┘
```
---
## 环境要求
### 生产环境配置
| 组件 | 最低配置 | 推荐配置 |
|------|---------|---------|
| CPU | 2 vCPU | 4 vCPU |
| 内存 | 4 GB | 8 GB |
| 存储 | 50 GB SSD | 100 GB SSD |
| Node.js | 20.x LTS | 20.x LTS |
### 基础设施要求
| 服务 | 版本 | 说明 |
|------|------|------|
| PostgreSQL | 15.x | 主数据库 |
| Redis | 7.x | 缓存和会话 |
| Apache Kafka | 3.x | 消息队列 |
---
## Docker 部署
### Dockerfile
```dockerfile
# 构建阶段
FROM node:20-alpine AS builder
WORKDIR /app
# 复制依赖文件
COPY package*.json ./
COPY prisma ./prisma/
# 安装依赖
RUN npm ci
# 生成 Prisma Client
RUN npx prisma generate
# 复制源代码
COPY . .
# 构建
RUN npm run build
# 生产阶段
FROM node:20-alpine AS production
WORKDIR /app
# 安装生产依赖
COPY package*.json ./
RUN npm ci --only=production
# 复制构建产物
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/prisma ./prisma
COPY --from=builder /app/node_modules/.prisma ./node_modules/.prisma
# 设置环境变量
ENV NODE_ENV=production
ENV PORT=3000
# 健康检查
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
# 暴露端口
EXPOSE 3000
# 启动命令
CMD ["node", "dist/main.js"]
```
### docker-compose.yml (生产)
```yaml
version: '3.8'
services:
reward-service:
build:
context: .
dockerfile: Dockerfile
ports:
- "3000:3000"
environment:
- NODE_ENV=production
- DATABASE_URL=${DATABASE_URL}
- REDIS_HOST=${REDIS_HOST}
- REDIS_PORT=${REDIS_PORT}
- KAFKA_BROKERS=${KAFKA_BROKERS}
- JWT_SECRET=${JWT_SECRET}
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
kafka:
condition: service_healthy
deploy:
replicas: 3
resources:
limits:
cpus: '2'
memory: 4G
reservations:
cpus: '1'
memory: 2G
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
postgres:
image: postgres:15-alpine
volumes:
- postgres-data:/var/lib/postgresql/data
environment:
- POSTGRES_USER=${POSTGRES_USER}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- POSTGRES_DB=${POSTGRES_DB}
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
interval: 10s
timeout: 5s
retries: 5
redis:
image: redis:7-alpine
volumes:
- redis-data:/data
command: redis-server --appendonly yes
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
kafka:
image: confluentinc/cp-kafka:7.5.0
depends_on:
- zookeeper
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
healthcheck:
test: ["CMD-SHELL", "kafka-broker-api-versions --bootstrap-server localhost:9092"]
interval: 10s
timeout: 10s
retries: 5
zookeeper:
image: confluentinc/cp-zookeeper:7.5.0
environment:
ZOOKEEPER_CLIENT_PORT: 2181
volumes:
postgres-data:
redis-data:
```
### 构建和推送镜像
```bash
# 构建镜像
docker build -t reward-service:latest .
# 标记镜像
docker tag reward-service:latest your-registry/reward-service:v1.0.0
# 推送到镜像仓库
docker push your-registry/reward-service:v1.0.0
```
---
## Kubernetes 部署
### Deployment
```yaml
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: reward-service
namespace: rwadurian
labels:
app: reward-service
spec:
replicas: 3
selector:
matchLabels:
app: reward-service
template:
metadata:
labels:
app: reward-service
spec:
containers:
- name: reward-service
image: your-registry/reward-service:v1.0.0
ports:
- containerPort: 3000
env:
- name: NODE_ENV
value: "production"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: reward-service-secrets
key: database-url
- name: REDIS_HOST
valueFrom:
configMapKeyRef:
name: reward-service-config
key: redis-host
- name: JWT_SECRET
valueFrom:
secretKeyRef:
name: reward-service-secrets
key: jwt-secret
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2000m"
memory: "4Gi"
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
```
### Service
```yaml
# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
name: reward-service
namespace: rwadurian
spec:
selector:
app: reward-service
ports:
- protocol: TCP
port: 80
targetPort: 3000
type: ClusterIP
```
### Ingress
```yaml
# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: reward-service-ingress
namespace: rwadurian
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: api.rwadurian.com
http:
paths:
- path: /rewards
pathType: Prefix
backend:
service:
name: reward-service
port:
number: 80
```
### ConfigMap
```yaml
# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: reward-service-config
namespace: rwadurian
data:
redis-host: "redis-master.rwadurian.svc.cluster.local"
redis-port: "6379"
kafka-brokers: "kafka-0.kafka.rwadurian.svc.cluster.local:9092"
```
### Secret
```yaml
# k8s/secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: reward-service-secrets
namespace: rwadurian
type: Opaque
stringData:
database-url: "postgresql://user:password@postgres:5432/reward_db"
jwt-secret: "your-jwt-secret-key"
```
### HorizontalPodAutoscaler
```yaml
# k8s/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: reward-service-hpa
namespace: rwadurian
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: reward-service
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
```
### 部署命令
```bash
# 创建命名空间
kubectl create namespace rwadurian
# 应用配置
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/secret.yaml
# 部署服务
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/ingress.yaml
kubectl apply -f k8s/hpa.yaml
# 查看部署状态
kubectl get pods -n rwadurian
kubectl get services -n rwadurian
# 查看日志
kubectl logs -f deployment/reward-service -n rwadurian
```
---
## 数据库迁移
### 生产环境迁移
```bash
# 1. 备份数据库
pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME > backup_$(date +%Y%m%d).sql
# 2. 运行迁移
DATABASE_URL=$PRODUCTION_DATABASE_URL npx prisma migrate deploy
# 3. 验证迁移
npx prisma db pull --print
```
### 回滚策略
```bash
# 查看迁移历史
npx prisma migrate status
# 回滚到指定版本 (手动)
psql -h $DB_HOST -U $DB_USER -d $DB_NAME < rollback_migration.sql
```
---
## 环境变量配置
### 生产环境变量
| 变量 | 描述 | 示例 |
|------|------|------|
| `NODE_ENV` | 运行环境 | `production` |
| `PORT` | 服务端口 | `3000` |
| `DATABASE_URL` | 数据库连接串 | `postgresql://user:pass@host:5432/db` |
| `REDIS_HOST` | Redis主机 | `redis-master` |
| `REDIS_PORT` | Redis端口 | `6379` |
| `KAFKA_BROKERS` | Kafka集群 | `kafka-0:9092,kafka-1:9092` |
| `KAFKA_CLIENT_ID` | Kafka客户端ID | `reward-service` |
| `KAFKA_GROUP_ID` | Kafka消费组ID | `reward-service-group` |
| `JWT_SECRET` | JWT密钥 | `<strong-secret>` |
| `LOG_LEVEL` | 日志级别 | `info` |
---
## 监控与告警
### 健康检查端点
```http
GET /health
```
响应:
```json
{
"status": "ok",
"service": "reward-service",
"timestamp": "2024-12-01T00:00:00.000Z"
}
```
### Prometheus 指标
添加 `@nestjs/terminus` 和 Prometheus 指标:
```typescript
// src/api/controllers/metrics.controller.ts
@Controller('metrics')
export class MetricsController {
@Get()
@Header('Content-Type', 'text/plain')
async getMetrics() {
return register.metrics();
}
}
```
### 关键指标
| 指标 | 描述 | 告警阈值 |
|------|------|---------|
| `http_request_duration_seconds` | 请求响应时间 | P99 > 2s |
| `http_requests_total` | 请求总数 | - |
| `http_request_errors_total` | 错误请求数 | 错误率 > 1% |
| `reward_distributed_total` | 分配的奖励数 | - |
| `reward_settled_total` | 结算的奖励数 | - |
| `reward_expired_total` | 过期的奖励数 | - |
### Grafana 仪表板
关键面板:
1. 请求吞吐量 (QPS)
2. 响应时间分布 (P50/P90/P99)
3. 错误率
4. 奖励分配/结算/过期趋势
5. 数据库连接池状态
6. Redis 缓存命中率
---
## 日志管理
### 日志格式
```typescript
// 结构化日志输出
{
"timestamp": "2024-12-01T00:00:00.000Z",
"level": "info",
"context": "RewardApplicationService",
"message": "Distributed 6 rewards for order 123",
"metadata": {
"orderId": "123",
"userId": "100",
"rewardCount": 6
}
}
```
### 日志级别
| 级别 | 用途 |
|------|------|
| `error` | 错误和异常 |
| `warn` | 警告信息 |
| `info` | 业务日志 |
| `debug` | 调试信息 (仅开发环境) |
### ELK 集成
```yaml
# filebeat.yml
filebeat.inputs:
- type: container
paths:
- /var/lib/docker/containers/*/*.log
processors:
- add_kubernetes_metadata:
output.elasticsearch:
hosts: ["elasticsearch:9200"]
indices:
- index: "reward-service-%{+yyyy.MM.dd}"
```
---
## CI/CD 流水线
### GitHub Actions
```yaml
# .github/workflows/deploy.yml
name: Deploy to Production
on:
push:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm ci
- run: npm run lint
- run: npm test
build:
needs: test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build Docker image
run: docker build -t reward-service:${{ github.sha }} .
- name: Push to registry
run: |
docker tag reward-service:${{ github.sha }} ${{ secrets.REGISTRY }}/reward-service:${{ github.sha }}
docker push ${{ secrets.REGISTRY }}/reward-service:${{ github.sha }}
deploy:
needs: build
runs-on: ubuntu-latest
steps:
- name: Deploy to Kubernetes
run: |
kubectl set image deployment/reward-service \
reward-service=${{ secrets.REGISTRY }}/reward-service:${{ github.sha }} \
-n rwadurian
```
---
## 故障排除
### 常见问题
#### 1. 服务无法启动
```bash
# 检查日志
kubectl logs -f deployment/reward-service -n rwadurian
# 检查环境变量
kubectl exec -it deployment/reward-service -n rwadurian -- env
# 检查数据库连接
kubectl exec -it deployment/reward-service -n rwadurian -- \
npx prisma db pull
```
#### 2. 数据库连接问题
```bash
# 测试数据库连接
kubectl run -it --rm debug --image=postgres:15-alpine --restart=Never -- \
psql -h postgres -U user -d reward_db
# 检查网络策略
kubectl get networkpolicy -n rwadurian
```
#### 3. Kafka 连接问题
```bash
# 列出 Kafka topics
kubectl exec -it kafka-0 -- \
kafka-topics --list --bootstrap-server localhost:9092
# 检查消费者组
kubectl exec -it kafka-0 -- \
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group reward-service-group
```
### 回滚部署
```bash
# 查看历史版本
kubectl rollout history deployment/reward-service -n rwadurian
# 回滚到上一版本
kubectl rollout undo deployment/reward-service -n rwadurian
# 回滚到指定版本
kubectl rollout undo deployment/reward-service -n rwadurian --to-revision=2
```
---
## 安全最佳实践
1. **密钥管理**: 使用 Kubernetes Secrets 或外部密钥管理服务 (Vault)
2. **网络隔离**: 使用 NetworkPolicy 限制 Pod 间通信
3. **镜像安全**: 定期扫描镜像漏洞
4. **最小权限**: 使用非 root 用户运行容器
5. **TLS**: 启用服务间 mTLS
6. **审计日志**: 记录所有敏感操作