rwadurian/backend/services/backup-service/docs/ARCHITECTURE.md

431 lines
18 KiB
Markdown

# Backup Service Architecture
## Overview
**Service Name:** `backup-service`
**Version:** 1.0.0
**Description:** RWA Durian MPC Backup Share Storage Service
**Primary Purpose:** Securely store and manage MPC backup shares (Party 2/3) for account recovery
## Core Responsibilities
- Store encrypted MPC backup share data (Party 2)
- Provide share retrieval for account recovery scenarios
- Support share revocation for key rotation or account closure
- Maintain comprehensive audit logs for all operations
- Implement rate limiting and access controls
## Critical Security Requirement
**Physical server isolation from identity-service is MANDATORY.** The backup-service must be deployed on a physically separate server to maintain MPC security. If compromised alone, attackers can only obtain 1 of 3 shares, making key reconstruction impossible.
```
┌─────────────────────────────────────────────────────────────────────────┐
│ MPC Key Distribution │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Party 0 (Server Share) Party 1 (Client Share) Party 2 (Backup) │
│ ┌─────────────────┐ ┌─────────────────┐ ┌──────────────┐ │
│ │ identity-service│ │ User Device │ │backup-service│ │
│ │ (Server A) │ │ (Mobile/Web) │ │ (Server B) │ │
│ └─────────────────┘ └─────────────────┘ └──────────────┘ │
│ │
│ 2-of-3 Threshold: Any 2 shares can reconstruct the private key │
│ │
└─────────────────────────────────────────────────────────────────────────┘
```
---
## DDD + Hexagonal Architecture
The service follows a layered architecture with clear separation of concerns:
```
┌─────────────────────────────────────────────────────────────────────────┐
│ API Layer (Adapters) │
│ Controllers, DTOs, HTTP Request/Response Handling │
├─────────────────────────────────────────────────────────────────────────┤
│ Application Layer │
│ Use Cases, Commands, Queries, Handlers, Services │
├─────────────────────────────────────────────────────────────────────────┤
│ Domain Layer │
│ Entities, Value Objects, Repositories (Interfaces), Business Logic │
├─────────────────────────────────────────────────────────────────────────┤
│ Infrastructure Layer (Adapters) │
│ Persistence (Prisma/PostgreSQL), Encryption, Crypto │
└─────────────────────────────────────────────────────────────────────────┘
```
### Layer Dependencies (Dependency Rule)
```
API Layer ──────────────▶ Application Layer
Domain Layer ◀─────── Infrastructure Layer
│ │
▼ ▼
(Interfaces defined) (Implementations)
```
**Key Principle:** Dependencies point inward. Domain layer has no external dependencies.
---
## Design Patterns
| Pattern | Implementation | Files |
|---------|----------------|-------|
| **Command Pattern** | Store, Revoke operations | `store-backup-share.command.ts`, `revoke-share.command.ts` |
| **Query Pattern** | Retrieve operation | `get-backup-share.query.ts` |
| **Repository Pattern** | Data access abstraction | `backup-share.repository.interface.ts`, `backup-share.repository.impl.ts` |
| **Dependency Injection** | NestJS DI Container | `*.module.ts` |
| **Guard Pattern** | Authentication & Authorization | `service-auth.guard.ts` |
| **Filter Pattern** | Global exception handling | `global-exception.filter.ts` |
| **Interceptor Pattern** | Request/response processing | `audit-log.interceptor.ts` |
| **Value Objects** | Immutable domain concepts | `share-id.vo.ts`, `encrypted-data.vo.ts` |
---
## Directory Structure
```
backup-service/
├── prisma/
│ ├── schema.prisma # Database schema definition
│ └── migrations/ # Database migration history
├── src/
│ ├── api/ # Adapter Layer (External Interface)
│ │ ├── controllers/
│ │ │ ├── backup-share.controller.ts # Main API endpoints
│ │ │ └── health.controller.ts # Health check endpoints
│ │ ├── dto/
│ │ │ ├── request/
│ │ │ │ ├── store-share.dto.ts # Store share request
│ │ │ │ ├── retrieve-share.dto.ts # Retrieve share request
│ │ │ │ └── revoke-share.dto.ts # Revoke share request
│ │ │ └── response/
│ │ │ └── share-info.dto.ts # Response DTOs
│ │ └── api.module.ts
│ │
│ ├── application/ # Use Cases Layer
│ │ ├── commands/
│ │ │ ├── store-backup-share/
│ │ │ │ ├── store-backup-share.command.ts
│ │ │ │ └── store-backup-share.handler.ts
│ │ │ └── revoke-share/
│ │ │ ├── revoke-share.command.ts
│ │ │ └── revoke-share.handler.ts
│ │ ├── queries/
│ │ │ └── get-backup-share/
│ │ │ ├── get-backup-share.query.ts
│ │ │ └── get-backup-share.handler.ts
│ │ ├── services/
│ │ │ └── backup-share-application.service.ts
│ │ ├── errors/
│ │ │ └── application.error.ts
│ │ └── application.module.ts
│ │
│ ├── domain/ # Core Business Logic
│ │ ├── entities/
│ │ │ └── backup-share.entity.ts # BackupShare aggregate root
│ │ ├── repositories/
│ │ │ └── backup-share.repository.interface.ts
│ │ ├── value-objects/
│ │ │ ├── share-id.vo.ts # Immutable share identifier
│ │ │ └── encrypted-data.vo.ts # Encrypted data structure
│ │ ├── errors/
│ │ │ └── domain.error.ts
│ │ └── domain.module.ts
│ │
│ ├── infrastructure/ # Adapter Layer (Services)
│ │ ├── persistence/
│ │ │ ├── prisma/
│ │ │ │ └── prisma.service.ts # Prisma ORM service
│ │ │ └── repositories/
│ │ │ ├── backup-share.repository.impl.ts # Repository implementation
│ │ │ └── audit-log.repository.ts # Audit logging
│ │ ├── crypto/
│ │ │ └── aes-encryption.service.ts # AES-256-GCM encryption
│ │ └── infrastructure.module.ts
│ │
│ ├── shared/ # Cross-cutting Concerns
│ │ ├── guards/
│ │ │ └── service-auth.guard.ts # JWT service authentication
│ │ ├── filters/
│ │ │ └── global-exception.filter.ts # Exception handling
│ │ └── interceptors/
│ │ └── audit-log.interceptor.ts # Request/response logging
│ │
│ ├── config/
│ │ └── index.ts # Centralized configuration
│ ├── app.module.ts # Root NestJS module
│ └── main.ts # Application entry point
├── test/ # Test files
│ ├── unit/
│ ├── integration/
│ ├── e2e/
│ ├── setup/
│ └── utils/
└── docs/ # Documentation
```
---
## Domain Layer Details
### BackupShare Entity (Aggregate Root)
**File:** `src/domain/entities/backup-share.entity.ts`
```typescript
class BackupShare {
// Identity
shareId: bigint | null // Auto-increment primary key
userId: bigint // From identity-service
accountSequence: bigint // Account identifier
// MPC Configuration
publicKey: string // MPC public key (66-130 chars)
partyIndex: number // Always 2 for backup share
threshold: number // Default 2 (for 2-of-3 scheme)
totalParties: number // Default 3
// Encrypted Data
encryptedShareData: string // AES-256-GCM encrypted data
encryptionKeyId: string // For key rotation support
// State Management
status: BackupShareStatus // ACTIVE | REVOKED | ROTATED
accessCount: number // Track access frequency
lastAccessedAt: Date | null
// Timestamps
createdAt: Date
updatedAt: Date
revokedAt: Date | null
}
// Factory Methods
BackupShare.create(params): BackupShare
BackupShare.reconstitute(props): BackupShare
// Domain Methods
recordAccess(): void // Increment access counter
revoke(reason: string): void // Mark as revoked
rotate(newData, newKeyId): void // Key rotation support
isActive(): boolean
```
### Value Objects
#### ShareId
```typescript
class ShareId {
static create(value: bigint | string | number): ShareId
get value(): bigint
toString(): string
equals(other: ShareId): boolean
}
```
#### EncryptedData
```typescript
class EncryptedData {
ciphertext: string // Base64 encoded
iv: string // Base64 encoded
authTag: string // Base64 encoded
keyId: string
static create(params): EncryptedData
static fromSerializedString(serialized, keyId): EncryptedData
toSerializedString(): string
}
```
### Repository Interface
```typescript
interface BackupShareRepository {
save(share: BackupShare): Promise<BackupShare>
findById(shareId: bigint): Promise<BackupShare | null>
findByUserId(userId: bigint): Promise<BackupShare | null>
findByPublicKey(publicKey: string): Promise<BackupShare | null>
findByUserIdAndPublicKey(userId: bigint, publicKey: string): Promise<BackupShare | null>
findByAccountSequence(accountSequence: bigint): Promise<BackupShare | null>
delete(shareId: bigint): Promise<void>
}
```
---
## Application Layer Details
### Command Handlers
#### StoreBackupShareHandler
**Flow:**
1. Check if share already exists for user (uniqueness constraint)
2. Check if share already exists for public key (uniqueness constraint)
3. Apply double encryption (AES-256-GCM)
4. Create BackupShare domain entity
5. Save to repository
6. Log audit event
7. Return shareId
#### RevokeShareHandler
**Flow:**
1. Find share by userId and publicKey
2. Call entity's `revoke()` method
3. Save changes to repository
4. Log audit event
### Query Handlers
#### GetBackupShareHandler
**Flow:**
1. Check rate limit (max 3 retrieves per day per user)
2. Find share by userId and publicKey
3. Verify share is ACTIVE
4. Record access in entity
5. Save entity state
6. Decrypt share data (removes our encryption layer)
7. Log audit event
8. Return decrypted data
---
## Infrastructure Layer Details
### Encryption Service
**Algorithm:** AES-256-GCM (authenticated encryption)
**IV Length:** 12 bytes (96 bits)
**Key Size:** 32 bytes (256 bits)
**Output Format:** `{ciphertext}:{iv}:{authTag}` (colon-separated base64)
```typescript
class AesEncryptionService {
async encrypt(plaintext: string): Promise<EncryptionResult>
async decrypt(encryptedData: string, keyId: string): Promise<string>
addKey(keyId: string, keyHex: string): void
getCurrentKeyId(): string
}
```
### Prisma ORM Service
Uses `@prisma/adapter-pg` for Prisma 7.x compatibility with PostgreSQL.
```typescript
class PrismaService extends PrismaClient implements OnModuleInit, OnModuleDestroy {
async onModuleInit() // Connect on startup
async onModuleDestroy() // Disconnect on shutdown
async cleanDatabase() // Test utility - delete all tables
}
```
---
## Database Schema
### BackupShare Table
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| share_id | BIGSERIAL | PK | Auto-increment ID |
| user_id | BIGINT | UNIQUE, NOT NULL | User identifier |
| account_sequence | BIGINT | UNIQUE, NOT NULL | Account sequence |
| public_key | VARCHAR(130) | UNIQUE, NOT NULL | MPC public key |
| party_index | INT | DEFAULT 2 | Party index (always 2) |
| threshold | INT | DEFAULT 2 | Threshold for reconstruction |
| total_parties | INT | DEFAULT 3 | Total parties |
| encrypted_share_data | TEXT | NOT NULL | Encrypted share data |
| encryption_key_id | VARCHAR(64) | NOT NULL | Encryption key ID |
| status | VARCHAR(20) | DEFAULT 'ACTIVE' | Share status |
| access_count | INT | DEFAULT 0 | Access counter |
| last_accessed_at | TIMESTAMP | NULLABLE | Last access time |
| created_at | TIMESTAMP | DEFAULT NOW() | Creation time |
| updated_at | TIMESTAMP | AUTO | Update time |
| revoked_at | TIMESTAMP | NULLABLE | Revocation time |
### ShareAccessLog Table
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| log_id | BIGSERIAL | PK | Auto-increment ID |
| share_id | BIGINT | NOT NULL | Reference to share |
| user_id | BIGINT | NOT NULL | User identifier |
| action | VARCHAR(20) | NOT NULL | STORE/RETRIEVE/REVOKE/ROTATE |
| source_service | VARCHAR(50) | NOT NULL | Calling service |
| source_ip | VARCHAR(45) | NOT NULL | Client IP |
| success | BOOLEAN | DEFAULT TRUE | Operation success |
| error_message | TEXT | NULLABLE | Error details |
| created_at | TIMESTAMP | DEFAULT NOW() | Log time |
---
## Key Architectural Decisions
### 1. Double Encryption
- Identity-service encrypts data once
- Backup-service encrypts again (AES-256-GCM)
- Defense-in-depth: even if one system is compromised, data remains encrypted
### 2. Physical Server Isolation
- MPC scheme is 2-of-3: requires at least 2 shares to reconstruct key
- Party 0 (Server Share) on identity-service
- Party 2 (Backup Share) on separate backup-service
- Party 1 (Client Share) on user device
- If only one server is compromised, MPC security remains intact
### 3. Audit Logging
- All operations logged with timestamp, user, action, source service, source IP
- Non-blocking writes (errors don't affect main operations)
- Supports compliance and security investigations
### 4. Rate Limiting
- Max 3 retrieves per user per day (prevents brute force recovery attempts)
- Configurable via `MAX_RETRIEVE_PER_DAY`
- Tracked in database, can be monitored for anomalies
### 5. Service-to-Service Auth
- JWT tokens with service identity
- No user authentication on backup-service (identity-service responsible)
- Simplified client trust model: only trust from known services
### 6. Error Handling
- Structured error codes for programmatic handling
- Sensitive data redacted from logs
- Standard error response format
---
## Key Files Reference
| File Path | Purpose |
|-----------|---------|
| `src/main.ts` | Entry point, NestFactory bootstrap |
| `src/app.module.ts` | Root module, global filters/interceptors |
| `src/config/index.ts` | Centralized configuration |
| `src/domain/entities/backup-share.entity.ts` | Core domain entity |
| `src/domain/repositories/backup-share.repository.interface.ts` | Repository interface |
| `src/application/commands/store-backup-share/` | Store use case |
| `src/application/queries/get-backup-share/` | Retrieve use case |
| `src/infrastructure/crypto/aes-encryption.service.ts` | Encryption service |
| `src/shared/guards/service-auth.guard.ts` | Authentication guard |
| `prisma/schema.prisma` | Database schema |