rwadurian/backend/services/backup-service/docs/DEPLOYMENT.md

17 KiB

Backup Service Deployment Guide

Overview

This guide covers deploying the backup-service to various environments. The service is designed to run in Docker containers with PostgreSQL as the database.

Critical Security Requirement: The backup-service MUST be deployed on a physically separate server from identity-service to maintain MPC security.


Deployment Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                         Production Architecture                         │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  Server A (Identity)              Server B (Backup)                     │
│  ┌─────────────────────┐         ┌─────────────────────┐               │
│  │  identity-service   │         │  backup-service     │               │
│  │  ┌───────────────┐  │         │  ┌───────────────┐  │               │
│  │  │ PostgreSQL    │  │         │  │ PostgreSQL    │  │               │
│  │  │ (identity-db) │  │         │  │ (backup-db)   │  │               │
│  │  └───────────────┘  │         │  └───────────────┘  │               │
│  └─────────────────────┘         └─────────────────────┘               │
│           │                               ▲                             │
│           │      Internal Network         │                             │
│           └───────────────────────────────┘                             │
│                (Service-to-Service JWT)                                 │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Docker Deployment

Production Dockerfile

# Dockerfile
# Multi-stage build for smaller image size

# Stage 1: Build
FROM node:20-alpine AS builder

WORKDIR /app

# Copy package files
COPY package*.json ./
COPY prisma ./prisma/

# Install all dependencies (including devDependencies for build)
RUN npm ci

# Copy source code
COPY . .

# Generate Prisma client
RUN npx prisma generate

# Build application
RUN npm run build

# Stage 2: Production
FROM node:20-alpine AS production

WORKDIR /app

# Create non-root user for security
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nestjs -u 1001

# Copy package files
COPY package*.json ./

# Install production dependencies only
RUN npm ci --only=production && npm cache clean --force

# Copy built application
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules/.prisma ./node_modules/.prisma
COPY --from=builder /app/prisma ./prisma

# Change ownership to non-root user
RUN chown -R nestjs:nodejs /app

# Switch to non-root user
USER nestjs

# Expose port
EXPOSE 3002

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD wget --no-verbose --tries=1 --spider http://localhost:3002/health || exit 1

# Start application
CMD ["node", "dist/main.js"]

Build and Push Image

# Build image
docker build -t rwa-durian/backup-service:latest .

# Tag for registry
docker tag rwa-durian/backup-service:latest registry.example.com/backup-service:v1.0.0

# Push to registry
docker push registry.example.com/backup-service:v1.0.0

Docker Compose Deployment

Production Compose File

# docker-compose.prod.yml
version: '3.8'

services:
  backup-service:
    image: rwa-durian/backup-service:latest
    container_name: backup-service
    restart: unless-stopped
    ports:
      - "3002:3002"
    environment:
      - DATABASE_URL=postgresql://postgres:${DB_PASSWORD}@backup-db:5432/rwa_backup?schema=public
      - APP_PORT=3002
      - APP_ENV=production
      - SERVICE_JWT_SECRET=${SERVICE_JWT_SECRET}
      - ALLOWED_SERVICES=${ALLOWED_SERVICES}
      - BACKUP_ENCRYPTION_KEY=${BACKUP_ENCRYPTION_KEY}
      - BACKUP_ENCRYPTION_KEY_ID=${BACKUP_ENCRYPTION_KEY_ID}
      - MAX_RETRIEVE_PER_DAY=3
      - MAX_STORE_PER_MINUTE=10
      - AUDIT_LOG_RETENTION_DAYS=365
    depends_on:
      backup-db:
        condition: service_healthy
    networks:
      - backup-network
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
    deploy:
      resources:
        limits:
          cpus: '1'
          memory: 512M
        reservations:
          cpus: '0.5'
          memory: 256M

  backup-db:
    image: postgres:15-alpine
    container_name: backup-db
    restart: unless-stopped
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_DB: rwa_backup
    volumes:
      - backup-db-data:/var/lib/postgresql/data
      - ./init-db.sql:/docker-entrypoint-initdb.d/init.sql:ro
    ports:
      - "5433:5432"  # Different port to avoid conflicts
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - backup-network
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

volumes:
  backup-db-data:
    driver: local

networks:
  backup-network:
    driver: bridge

Environment File

# .env.production
DB_PASSWORD=your-strong-database-password-here
SERVICE_JWT_SECRET=your-super-secret-service-jwt-key-min-32-chars
ALLOWED_SERVICES=identity-service,recovery-service
BACKUP_ENCRYPTION_KEY=your-256-bit-encryption-key-in-hex-64-chars
BACKUP_ENCRYPTION_KEY_ID=key-v1

Deploy Commands

# Pull latest image
docker-compose -f docker-compose.prod.yml pull

# Start services
docker-compose -f docker-compose.prod.yml up -d

# Run database migrations
docker-compose -f docker-compose.prod.yml exec backup-service \
  npx prisma migrate deploy

# View logs
docker-compose -f docker-compose.prod.yml logs -f backup-service

# Stop services
docker-compose -f docker-compose.prod.yml down

Kubernetes Deployment

Namespace and ConfigMap

# kubernetes/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: rwa-backup

---
# kubernetes/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: backup-service-config
  namespace: rwa-backup
data:
  APP_PORT: "3002"
  APP_ENV: "production"
  ALLOWED_SERVICES: "identity-service,recovery-service"
  MAX_RETRIEVE_PER_DAY: "3"
  MAX_STORE_PER_MINUTE: "10"
  AUDIT_LOG_RETENTION_DAYS: "365"

Secrets

# kubernetes/secrets.yaml
apiVersion: v1
kind: Secret
metadata:
  name: backup-service-secrets
  namespace: rwa-backup
type: Opaque
stringData:
  DATABASE_URL: "postgresql://postgres:password@backup-db:5432/rwa_backup?schema=public"
  SERVICE_JWT_SECRET: "your-super-secret-service-jwt-key-min-32-chars"
  BACKUP_ENCRYPTION_KEY: "your-256-bit-encryption-key-in-hex-64-chars"
  BACKUP_ENCRYPTION_KEY_ID: "key-v1"

Deployment

# kubernetes/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: backup-service
  namespace: rwa-backup
spec:
  replicas: 2
  selector:
    matchLabels:
      app: backup-service
  template:
    metadata:
      labels:
        app: backup-service
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 1001
      containers:
        - name: backup-service
          image: registry.example.com/backup-service:v1.0.0
          ports:
            - containerPort: 3002
          envFrom:
            - configMapRef:
                name: backup-service-config
            - secretRef:
                name: backup-service-secrets
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"
          livenessProbe:
            httpGet:
              path: /health/live
              port: 3002
            initialDelaySeconds: 15
            periodSeconds: 20
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 3002
            initialDelaySeconds: 5
            periodSeconds: 10

Service

# kubernetes/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: backup-service
  namespace: rwa-backup
spec:
  selector:
    app: backup-service
  ports:
    - protocol: TCP
      port: 3002
      targetPort: 3002
  type: ClusterIP

PostgreSQL StatefulSet

# kubernetes/postgres.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: backup-db
  namespace: rwa-backup
spec:
  serviceName: backup-db
  replicas: 1
  selector:
    matchLabels:
      app: backup-db
  template:
    metadata:
      labels:
        app: backup-db
    spec:
      containers:
        - name: postgres
          image: postgres:15-alpine
          ports:
            - containerPort: 5432
          env:
            - name: POSTGRES_DB
              value: rwa_backup
            - name: POSTGRES_USER
              value: postgres
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: backup-db-secrets
                  key: password
          volumeMounts:
            - name: postgres-data
              mountPath: /var/lib/postgresql/data
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"
  volumeClaimTemplates:
    - metadata:
        name: postgres-data
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 10Gi

Deploy to Kubernetes

# Apply all manifests
kubectl apply -f kubernetes/

# Check deployment status
kubectl -n rwa-backup get pods

# View logs
kubectl -n rwa-backup logs -f deployment/backup-service

# Run migrations
kubectl -n rwa-backup exec -it deployment/backup-service -- \
  npx prisma migrate deploy

Database Management

Initial Setup

# Run migrations on first deployment
npx prisma migrate deploy

# Or push schema directly (development only)
npx prisma db push

Backup and Restore

# Backup database
docker-compose exec backup-db pg_dump -U postgres rwa_backup > backup.sql

# Restore database
docker-compose exec -T backup-db psql -U postgres rwa_backup < backup.sql

Migration in Production

# Generate migration (development)
npx prisma migrate dev --name add_new_field

# Apply migration (production)
npx prisma migrate deploy

Environment Variables Reference

Required Variables

Variable Description Example
DATABASE_URL PostgreSQL connection string postgresql://user:pass@host:5432/db
SERVICE_JWT_SECRET JWT secret for service auth (min 32 chars) Random 64+ char string
ALLOWED_SERVICES Comma-separated allowed services identity-service,recovery-service
BACKUP_ENCRYPTION_KEY 256-bit key in hex (64 chars) 64 hex characters
BACKUP_ENCRYPTION_KEY_ID Key identifier key-v1

Optional Variables

Variable Default Description
APP_PORT 3002 Server port
APP_ENV development Environment (development/production)
MAX_RETRIEVE_PER_DAY 3 Max retrieves per user per day
MAX_STORE_PER_MINUTE 10 Max stores per minute
AUDIT_LOG_RETENTION_DAYS 365 Audit log retention period

Security Considerations

Network Security

  1. Isolate backup-service network

    • Use private subnets
    • Restrict access to identity-service only
    • Use VPN or VPC peering for cross-server communication
  2. Firewall rules

    # Allow only identity-service IP
    iptables -A INPUT -p tcp --dport 3002 -s identity-service-ip -j ACCEPT
    iptables -A INPUT -p tcp --dport 3002 -j DROP
    
  3. TLS/SSL

    • Use reverse proxy (nginx/traefik) for TLS termination
    • Enable mutual TLS for service-to-service communication

Secret Management

  1. Use secret management services

    • AWS Secrets Manager
    • HashiCorp Vault
    • Kubernetes Secrets with encryption at rest
  2. Rotate secrets regularly

    • Rotate encryption keys annually
    • Rotate JWT secrets quarterly
    • Use key versioning for encryption keys

Database Security

  1. Use strong passwords
  2. Enable SSL for database connections
  3. Regular backups with encryption
  4. Limit database user permissions

Monitoring and Logging

Health Endpoints

Endpoint Purpose
GET /health Basic health check
GET /health/ready Readiness probe (includes DB check)
GET /health/live Liveness probe

Prometheus Metrics (Optional)

# Add to deployment
- name: PROMETHEUS_ENABLED
  value: "true"
- name: PROMETHEUS_PORT
  value: "9102"

Log Aggregation

Configure log driver for centralized logging:

logging:
  driver: "fluentd"
  options:
    fluentd-address: "localhost:24224"
    tag: "backup-service"

Troubleshooting

Common Issues

Service won't start

# Check logs
docker-compose logs backup-service

# Common causes:
# 1. Database not ready
# 2. Missing environment variables
# 3. Invalid encryption key format

Database connection failed

# Check database is running
docker-compose ps backup-db

# Check database logs
docker-compose logs backup-db

# Test connection
docker-compose exec backup-service \
  npx prisma db pull

Authentication errors

# Verify JWT secret matches between services
# Check ALLOWED_SERVICES includes calling service
# Verify token format and expiration

Recovery Procedures

Database Recovery

# Stop service
docker-compose stop backup-service

# Restore from backup
docker-compose exec -T backup-db psql -U postgres rwa_backup < backup.sql

# Run migrations
docker-compose exec backup-service npx prisma migrate deploy

# Start service
docker-compose start backup-service

Key Rotation

  1. Add new key to encryption service
  2. Re-encrypt existing data with new key
  3. Update BACKUP_ENCRYPTION_KEY_ID
  4. Remove old key after transition period

Scaling

Horizontal Scaling

The service is stateless and can be horizontally scaled:

# Docker Compose scale
docker-compose up -d --scale backup-service=3

# Kubernetes replicas
kubectl -n rwa-backup scale deployment/backup-service --replicas=3

Load Balancing

Use a load balancer in front of multiple instances:

# nginx.conf
upstream backup_service {
    least_conn;
    server backup-service-1:3002;
    server backup-service-2:3002;
    server backup-service-3:3002;
}

server {
    listen 443 ssl;
    server_name backup-api.example.com;

    location / {
        proxy_pass http://backup_service;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Database Scaling

For high availability:

  1. Use managed PostgreSQL (AWS RDS, GCP Cloud SQL)
  2. Configure read replicas for read scaling
  3. Use connection pooling (PgBouncer)

Maintenance

Regular Tasks

Task Frequency Command
Database backup Daily pg_dump rwa_backup > backup.sql
Log rotation Weekly Automatic with log driver config
Security updates Monthly Rebuild and redeploy image
Audit log cleanup Monthly DELETE FROM share_access_logs WHERE created_at < NOW() - INTERVAL '365 days'

Update Procedure

# 1. Build new image
docker build -t rwa-durian/backup-service:v1.1.0 .

# 2. Push to registry
docker push registry.example.com/backup-service:v1.1.0

# 3. Update deployment
docker-compose pull
docker-compose up -d

# 4. Run migrations if needed
docker-compose exec backup-service npx prisma migrate deploy

# 5. Verify health
curl http://localhost:3002/health