refactor(mpc-system): migrate to party-driven architecture with PartyID-based routing
- Remove Address field from PartyEndpoint (parties connect to router themselves) - Update K8s Discovery to only manage PartyID and Role labels - Add Party registration and SessionEvent protobuf definitions - Implement PartyRegistry and SessionEventBroadcaster domain logic - Add RegisterParty and SubscribeSessionEvents gRPC handlers - Prepare infrastructure for party-driven MPC coordination This is the first phase of migrating from coordinator-driven to party-driven architecture following international MPC system design patterns.
This commit is contained in:
parent
e975e9d86c
commit
747e4ae8ef
|
|
@ -31,7 +31,14 @@
|
|||
"Bash(wsl.exe -- bash -c 'find ~/rwadurian/backend/mpc-system/services/server-party -name \"\"main.go\"\" -path \"\"*/cmd/server/*\"\"')",
|
||||
"Bash(wsl.exe -- bash -c 'cat ~/rwadurian/backend/mpc-system/services/server-party/cmd/server/main.go | grep -E \"\"grpc|GRPC|gRPC|50051\"\" | head -20')",
|
||||
"Bash(wsl.exe -- bash:*)",
|
||||
"Bash(dir:*)"
|
||||
"Bash(dir:*)",
|
||||
"Bash(go version:*)",
|
||||
"Bash(go mod download:*)",
|
||||
"Bash(go build:*)",
|
||||
"Bash(go mod tidy:*)",
|
||||
"Bash(findstr:*)",
|
||||
"Bash(del \"c:\\Users\\dong\\Desktop\\rwadurian\\backend\\mpc-system\\PARTY_ROLE_VERIFICATION_REPORT.md\")",
|
||||
"Bash(protoc:*)"
|
||||
],
|
||||
"deny": [],
|
||||
"ask": []
|
||||
|
|
|
|||
|
|
@ -0,0 +1,295 @@
|
|||
# Party Role Labels Implementation - Verification Report
|
||||
|
||||
**Date**: 2025-12-05
|
||||
**Commit**: e975e9d - "feat(mpc-system): implement party role labels with strict persistent-only default"
|
||||
**Environment**: Docker Compose (Local Development)
|
||||
|
||||
---
|
||||
|
||||
## 1. Implementation Summary
|
||||
|
||||
### 1.1 Overview
|
||||
Implemented Party Role Labels (Solution 1) to differentiate between three types of server parties:
|
||||
- **Persistent**: Stores key shares in database permanently
|
||||
- **Delegate**: Generates user shares and returns them to caller (doesn't store)
|
||||
- **Temporary**: For ad-hoc operations
|
||||
|
||||
### 1.2 Core Changes
|
||||
|
||||
#### Files Modified
|
||||
1. `services/session-coordinator/application/ports/output/party_pool_port.go`
|
||||
- Added `PartyRole` enum (persistent, delegate, temporary)
|
||||
- Added `PartyEndpoint.Role` field
|
||||
- Added `PartySelectionFilter` struct with role filtering
|
||||
- Added `SelectPartiesWithFilter()` and `GetAvailablePartiesByRole()` methods
|
||||
|
||||
2. `services/session-coordinator/infrastructure/k8s/party_discovery.go`
|
||||
- Implemented role extraction from K8s pod labels (`party-role`)
|
||||
- Implemented `GetAvailablePartiesByRole()` for role-based filtering
|
||||
- Implemented `SelectPartiesWithFilter()` with role and count requirements
|
||||
- Default role: `persistent` if label not found
|
||||
|
||||
3. `services/session-coordinator/application/ports/input/session_management_port.go`
|
||||
- Added `PartyComposition` struct with role-based party counts
|
||||
- Added optional `PartyComposition` field to `CreateSessionInput`
|
||||
|
||||
4. `services/session-coordinator/application/use_cases/create_session.go`
|
||||
- Implemented strict persistent-only default policy (lines 102-114)
|
||||
- Implemented `selectPartiesByComposition()` method with empty composition validation (lines 224-284)
|
||||
- Added clear error messages for insufficient parties
|
||||
|
||||
5. `k8s/server-party-deployment.yaml`
|
||||
- Added label: `party-role: persistent` (line 25)
|
||||
|
||||
6. `k8s/server-party-api-deployment.yaml` (NEW FILE)
|
||||
- New deployment for delegate parties
|
||||
- Added label: `party-role: delegate` (line 25)
|
||||
- Replicas: 2 (for generating user shares)
|
||||
|
||||
---
|
||||
|
||||
## 2. Security Policy Implementation
|
||||
|
||||
### 2.1 Strict Persistent-Only Default
|
||||
When `PartyComposition` is **nil** (not specified):
|
||||
- System MUST select only `persistent` parties
|
||||
- If insufficient persistent parties available → **Fail immediately with clear error**
|
||||
- NO automatic fallback to delegate/temporary parties
|
||||
- Error message: "insufficient persistent parties: need N persistent parties but not enough available. Use PartyComposition to specify custom party requirements"
|
||||
|
||||
**Code Reference**: [create_session.go:102-114](c:\Users\dong\Desktop\rwadurian\backend\mpc-system\services\session-coordinator\application\use_cases\create_session.go#L102-L114)
|
||||
|
||||
### 2.2 Empty PartyComposition Validation
|
||||
When `PartyComposition` is specified but all counts are 0:
|
||||
- System returns error: "PartyComposition specified but no parties selected: all counts are zero and no custom filters provided"
|
||||
- Prevents accidental bypass of persistent-only requirement
|
||||
|
||||
**Code Reference**: [create_session.go:279-281](c:\Users\dong\Desktop\rwadurian\backend\mpc-system\services\session-coordinator\application\use_cases\create_session.go#L279-L281)
|
||||
|
||||
### 2.3 Threshold Security Guarantee
|
||||
- Default policy ensures MPC threshold security by using only persistent parties
|
||||
- Persistent parties store shares in database, ensuring T-of-N shares are always available for future sign operations
|
||||
- Delegate parties (which don't store shares) are only used when explicitly specified via `PartyComposition`
|
||||
|
||||
---
|
||||
|
||||
## 3. Docker Compose Deployment Verification
|
||||
|
||||
### 3.1 Build Status
|
||||
**Command**: `./deploy.sh build`
|
||||
**Status**: ✅ SUCCESS
|
||||
**Images Built**:
|
||||
1. mpc-system-postgres (postgres:15-alpine)
|
||||
2. mpc-system-rabbitmq (rabbitmq:3-management-alpine)
|
||||
3. mpc-system-redis (redis:7-alpine)
|
||||
4. mpc-system-session-coordinator
|
||||
5. mpc-system-message-router
|
||||
6. mpc-system-server-party-1/2/3
|
||||
7. mpc-system-server-party-api
|
||||
8. mpc-system-account-service
|
||||
|
||||
### 3.2 Deployment Status
|
||||
**Command**: `./deploy.sh up`
|
||||
**Status**: ✅ SUCCESS
|
||||
**Services Running** (10 containers):
|
||||
|
||||
| Service | Status | Ports | Notes |
|
||||
|---------|--------|-------|-------|
|
||||
| mpc-postgres | Healthy | 5432 (internal) | PostgreSQL 15 |
|
||||
| mpc-rabbitmq | Healthy | 5672, 15672 (internal) | Message broker |
|
||||
| mpc-redis | Healthy | 6379 (internal) | Cache store |
|
||||
| mpc-session-coordinator | Healthy | 8081:8080 | Core orchestration |
|
||||
| mpc-message-router | Healthy | 8082:8080 | Message routing |
|
||||
| mpc-server-party-1 | Healthy | 50051, 8080 (internal) | Persistent party |
|
||||
| mpc-server-party-2 | Healthy | 50051, 8080 (internal) | Persistent party |
|
||||
| mpc-server-party-3 | Healthy | 50051, 8080 (internal) | Persistent party |
|
||||
| mpc-server-party-api | Healthy | 8083:8080 | Delegate party |
|
||||
| mpc-account-service | Healthy | 4000:8080 | Application service |
|
||||
|
||||
### 3.3 Health Check Results
|
||||
```bash
|
||||
# Session Coordinator
|
||||
$ curl http://localhost:8081/health
|
||||
{"service":"session-coordinator","status":"healthy"}
|
||||
|
||||
# Account Service
|
||||
$ curl http://localhost:4000/health
|
||||
{"service":"account","status":"healthy"}
|
||||
```
|
||||
|
||||
**Status**: ✅ All services responding to health checks
|
||||
|
||||
---
|
||||
|
||||
## 4. Known Limitations in Docker Compose Environment
|
||||
|
||||
### 4.1 K8s Party Discovery Not Available
|
||||
**Log Message**:
|
||||
```
|
||||
{"level":"warn","message":"K8s party discovery not available, will use dynamic join mode",
|
||||
"error":"failed to create k8s config: stat /home/mpc/.kube/config: no such file or directory"}
|
||||
```
|
||||
|
||||
**Impact**:
|
||||
- Party role labels (`party-role`) from K8s deployments are not accessible in Docker Compose
|
||||
- System falls back to dynamic join mode (universal join tokens)
|
||||
- `PartyPoolPort` is not available, so `selectPartiesByComposition()` logic is not exercised
|
||||
|
||||
**Why This Happens**:
|
||||
- Docker Compose doesn't provide K8s API access
|
||||
- Party discovery requires K8s Service Discovery and pod label queries
|
||||
- This is expected behavior for non-K8s environments
|
||||
|
||||
### 4.2 Party Role Labels Not Testable in Docker Compose
|
||||
The following features cannot be tested in Docker Compose:
|
||||
1. Role-based party filtering (`SelectPartiesWithFilter`)
|
||||
2. `PartyComposition`-based party selection
|
||||
3. Strict persistent-only default policy
|
||||
4. K8s pod label reading (`party-role`)
|
||||
|
||||
**These features require actual Kubernetes deployment to test.**
|
||||
|
||||
---
|
||||
|
||||
## 5. What Was Verified
|
||||
|
||||
### 5.1 Code Compilation ✅
|
||||
- All modified Go files compile successfully
|
||||
- No syntax errors or type errors
|
||||
- Build completes on both Windows (local) and WSL (Docker)
|
||||
|
||||
### 5.2 Service Deployment ✅
|
||||
- All 10 services start successfully
|
||||
- All health checks pass
|
||||
- Services can connect to each other (gRPC connectivity verified in logs)
|
||||
- Database connections established
|
||||
- Message broker connections established
|
||||
|
||||
### 5.3 Code Logic Review ✅
|
||||
- Strict persistent-only default policy correctly implemented
|
||||
- Empty `PartyComposition` validation prevents loophole
|
||||
- Clear error messages for insufficient parties
|
||||
- Role extraction from K8s pod labels correctly implemented
|
||||
- Role-based filtering logic correct
|
||||
|
||||
---
|
||||
|
||||
## 6. What Cannot Be Verified Without K8s
|
||||
|
||||
### 6.1 Runtime Behavior
|
||||
1. **Party Discovery**: K8s pod label queries
|
||||
2. **Role Filtering**: Actual filtering by `party-role` label values
|
||||
3. **Persistent-Only Policy**: Enforcement when persistent parties insufficient
|
||||
4. **Error Messages**: Actual error messages when party selection fails
|
||||
5. **PartyComposition**: Custom party mix selection
|
||||
|
||||
### 6.2 Integration Testing
|
||||
1. Creating a session with default (nil) `PartyComposition` → should select only persistent parties
|
||||
2. Creating a session with insufficient persistent parties → should return clear error
|
||||
3. Creating a session with empty `PartyComposition` → should return validation error
|
||||
4. Creating a session with custom `PartyComposition` → should select correct party mix
|
||||
|
||||
---
|
||||
|
||||
## 7. Next Steps for Full Verification
|
||||
|
||||
### 7.1 Deploy to Kubernetes Cluster
|
||||
To fully test Party Role Labels, deploy to actual K8s cluster:
|
||||
```bash
|
||||
# Apply K8s manifests
|
||||
kubectl apply -f k8s/namespace.yaml
|
||||
kubectl apply -f k8s/configmap.yaml
|
||||
kubectl apply -f k8s/secrets.yaml
|
||||
kubectl apply -f k8s/postgres-deployment.yaml
|
||||
kubectl apply -f k8s/rabbitmq-deployment.yaml
|
||||
kubectl apply -f k8s/redis-deployment.yaml
|
||||
kubectl apply -f k8s/server-party-deployment.yaml
|
||||
kubectl apply -f k8s/server-party-api-deployment.yaml
|
||||
kubectl apply -f k8s/session-coordinator-deployment.yaml
|
||||
kubectl apply -f k8s/message-router-deployment.yaml
|
||||
kubectl apply -f k8s/account-service-deployment.yaml
|
||||
|
||||
# Verify party discovery works
|
||||
kubectl logs -n mpc-system -l app=mpc-session-coordinator | grep -i "party\|role\|discovery"
|
||||
|
||||
# Verify pod labels are set
|
||||
kubectl get pods -n mpc-system -l app=mpc-server-party -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.labels.party-role}{"\n"}{end}'
|
||||
kubectl get pods -n mpc-system -l app=mpc-server-party-api -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.labels.party-role}{"\n"}{end}'
|
||||
```
|
||||
|
||||
### 7.2 Integration Testing in K8s
|
||||
1. **Test Default Persistent-Only Selection**:
|
||||
```bash
|
||||
curl -X POST http://<account-service>/api/v1/accounts \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"user_id": "test-user-1"}'
|
||||
|
||||
# Expected: Session created with 3 persistent parties
|
||||
# Check logs: kubectl logs -n mpc-system -l app=mpc-session-coordinator | grep "selected persistent parties by default"
|
||||
```
|
||||
|
||||
2. **Test Insufficient Persistent Parties Error**:
|
||||
```bash
|
||||
# Scale down persistent parties to 2
|
||||
kubectl scale deployment mpc-server-party -n mpc-system --replicas=2
|
||||
|
||||
# Try creating session requiring 3 parties
|
||||
curl -X POST http://<account-service>/api/v1/accounts \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"user_id": "test-user-2"}'
|
||||
|
||||
# Expected: HTTP 500 error with message "insufficient persistent parties: need 3 persistent parties but not enough available"
|
||||
```
|
||||
|
||||
3. **Test Empty PartyComposition Validation**:
|
||||
- Requires API endpoint that accepts `PartyComposition` parameter
|
||||
- Send request with `PartyComposition: {PersistentCount: 0, DelegateCount: 0, TemporaryCount: 0}`
|
||||
- Expected: HTTP 400 error with message "PartyComposition specified but no parties selected"
|
||||
|
||||
4. **Test Custom PartyComposition**:
|
||||
- Send request with `PartyComposition: {PersistentCount: 2, DelegateCount: 1}`
|
||||
- Expected: Session created with 2 persistent + 1 delegate party
|
||||
- Verify party roles in session data
|
||||
|
||||
---
|
||||
|
||||
## 8. Conclusion
|
||||
|
||||
### 8.1 Implementation Status: ✅ COMPLETE
|
||||
- All code changes implemented correctly
|
||||
- Strict persistent-only default policy enforced
|
||||
- Empty `PartyComposition` validation prevents loophole
|
||||
- Clear error messages for insufficient parties
|
||||
- Backward compatibility maintained (optional `PartyComposition`)
|
||||
|
||||
### 8.2 Deployment Status: ✅ SUCCESS (Docker Compose)
|
||||
- All services build successfully
|
||||
- All services deploy successfully
|
||||
- All services healthy and responding
|
||||
- Inter-service connectivity verified
|
||||
|
||||
### 8.3 Verification Status: ⚠️ PARTIAL
|
||||
- ✅ Code compilation and logic review
|
||||
- ✅ Docker Compose deployment
|
||||
- ✅ Service health checks
|
||||
- ❌ Party role filtering runtime behavior (requires K8s)
|
||||
- ❌ Persistent-only policy enforcement (requires K8s)
|
||||
- ❌ Integration testing (requires K8s)
|
||||
|
||||
### 8.4 Readiness for Production
|
||||
**Code Readiness**: ✅ READY
|
||||
**Testing Readiness**: ⚠️ REQUIRES K8S DEPLOYMENT FOR FULL TESTING
|
||||
**Deployment Readiness**: ✅ READY (K8s manifests prepared)
|
||||
|
||||
---
|
||||
|
||||
## 9. User Confirmation Required
|
||||
|
||||
The Party Role Labels implementation is complete and successfully deployed in Docker Compose. However, full runtime verification requires deploying to an actual Kubernetes cluster.
|
||||
|
||||
**Options**:
|
||||
1. Proceed with K8s deployment for full verification
|
||||
2. Accept partial verification (code review + Docker Compose deployment)
|
||||
3. Create integration tests that mock K8s party discovery
|
||||
|
||||
Awaiting user instruction on next steps.
|
||||
|
|
@ -14,6 +14,12 @@ service MessageRouter {
|
|||
|
||||
// GetPendingMessages retrieves pending messages (polling alternative)
|
||||
rpc GetPendingMessages(GetPendingMessagesRequest) returns (GetPendingMessagesResponse);
|
||||
|
||||
// RegisterParty registers a party with the message router (party actively connects)
|
||||
rpc RegisterParty(RegisterPartyRequest) returns (RegisterPartyResponse);
|
||||
|
||||
// SubscribeSessionEvents subscribes to session lifecycle events (session start, etc.)
|
||||
rpc SubscribeSessionEvents(SubscribeSessionEventsRequest) returns (stream SessionEvent);
|
||||
}
|
||||
|
||||
// RouteMessageRequest routes an MPC message
|
||||
|
|
@ -61,3 +67,37 @@ message GetPendingMessagesRequest {
|
|||
message GetPendingMessagesResponse {
|
||||
repeated MPCMessage messages = 1;
|
||||
}
|
||||
|
||||
// RegisterPartyRequest registers a party with the router
|
||||
message RegisterPartyRequest {
|
||||
string party_id = 1; // Unique party identifier
|
||||
string party_role = 2; // persistent, delegate, or temporary
|
||||
string version = 3; // Party software version
|
||||
}
|
||||
|
||||
// RegisterPartyResponse confirms party registration
|
||||
message RegisterPartyResponse {
|
||||
bool success = 1;
|
||||
string message = 2;
|
||||
int64 registered_at = 3; // Unix timestamp milliseconds
|
||||
}
|
||||
|
||||
// SubscribeSessionEventsRequest subscribes to session events
|
||||
message SubscribeSessionEventsRequest {
|
||||
string party_id = 1; // Party ID subscribing to events
|
||||
repeated string event_types = 2; // Event types to subscribe (empty = all)
|
||||
}
|
||||
|
||||
// SessionEvent represents a session lifecycle event
|
||||
message SessionEvent {
|
||||
string event_id = 1;
|
||||
string event_type = 2; // session_created, session_started, etc.
|
||||
string session_id = 3;
|
||||
int32 threshold_n = 4;
|
||||
int32 threshold_t = 5;
|
||||
repeated string selected_parties = 6; // PartyIDs selected for this session
|
||||
map<string, string> join_tokens = 7; // PartyID -> JoinToken mapping
|
||||
bytes message_hash = 8; // For sign sessions
|
||||
int64 created_at = 9; // Unix timestamp milliseconds
|
||||
int64 expires_at = 10; // Unix timestamp milliseconds
|
||||
}
|
||||
|
|
|
|||
|
|
@ -174,52 +174,193 @@ func main() {
|
|||
}
|
||||
|
||||
func initDatabase(cfg config.DatabaseConfig) (*sql.DB, error) {
|
||||
db, err := sql.Open("postgres", cfg.DSN())
|
||||
const maxRetries = 10
|
||||
const retryDelay = 2 * time.Second
|
||||
|
||||
var db *sql.DB
|
||||
var err error
|
||||
|
||||
for i := 0; i < maxRetries; i++ {
|
||||
db, err = sql.Open("postgres", cfg.DSN())
|
||||
if err != nil {
|
||||
return nil, err
|
||||
logger.Warn("Failed to open database connection, retrying...",
|
||||
zap.Int("attempt", i+1),
|
||||
zap.Int("max_retries", maxRetries),
|
||||
zap.Error(err))
|
||||
time.Sleep(retryDelay * time.Duration(i+1))
|
||||
continue
|
||||
}
|
||||
|
||||
db.SetMaxOpenConns(cfg.MaxOpenConns)
|
||||
db.SetMaxIdleConns(cfg.MaxIdleConns)
|
||||
db.SetConnMaxLifetime(cfg.ConnMaxLife)
|
||||
|
||||
// Test connection
|
||||
if err := db.Ping(); err != nil {
|
||||
return nil, err
|
||||
// Test connection with Ping
|
||||
if err = db.Ping(); err != nil {
|
||||
logger.Warn("Failed to ping database, retrying...",
|
||||
zap.Int("attempt", i+1),
|
||||
zap.Int("max_retries", maxRetries),
|
||||
zap.Error(err))
|
||||
db.Close()
|
||||
time.Sleep(retryDelay * time.Duration(i+1))
|
||||
continue
|
||||
}
|
||||
|
||||
logger.Info("Connected to PostgreSQL")
|
||||
// Verify database is actually usable with a simple query
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
var result int
|
||||
err = db.QueryRowContext(ctx, "SELECT 1").Scan(&result)
|
||||
cancel()
|
||||
if err != nil {
|
||||
logger.Warn("Database ping succeeded but query failed, retrying...",
|
||||
zap.Int("attempt", i+1),
|
||||
zap.Int("max_retries", maxRetries),
|
||||
zap.Error(err))
|
||||
db.Close()
|
||||
time.Sleep(retryDelay * time.Duration(i+1))
|
||||
continue
|
||||
}
|
||||
|
||||
logger.Info("Connected to PostgreSQL and verified connectivity",
|
||||
zap.Int("attempt", i+1))
|
||||
return db, nil
|
||||
}
|
||||
|
||||
return nil, fmt.Errorf("failed to connect to database after %d retries: %w", maxRetries, err)
|
||||
}
|
||||
|
||||
func initRedis(cfg config.RedisConfig) *redis.Client {
|
||||
const maxRetries = 10
|
||||
const retryDelay = 2 * time.Second
|
||||
|
||||
client := redis.NewClient(&redis.Options{
|
||||
Addr: cfg.Addr(),
|
||||
Password: cfg.Password,
|
||||
DB: cfg.DB,
|
||||
})
|
||||
|
||||
// Test connection
|
||||
// Test connection with retry
|
||||
ctx := context.Background()
|
||||
for i := 0; i < maxRetries; i++ {
|
||||
if err := client.Ping(ctx).Err(); err != nil {
|
||||
logger.Warn("Redis connection failed, continuing without cache", zap.Error(err))
|
||||
} else {
|
||||
logger.Warn("Redis connection failed, retrying...",
|
||||
zap.Int("attempt", i+1),
|
||||
zap.Int("max_retries", maxRetries),
|
||||
zap.Error(err))
|
||||
time.Sleep(retryDelay * time.Duration(i+1))
|
||||
continue
|
||||
}
|
||||
logger.Info("Connected to Redis")
|
||||
return client
|
||||
}
|
||||
|
||||
logger.Warn("Redis connection failed after retries, continuing without cache")
|
||||
return client
|
||||
}
|
||||
|
||||
func initRabbitMQ(cfg config.RabbitMQConfig) (*amqp.Connection, error) {
|
||||
conn, err := amqp.Dial(cfg.URL())
|
||||
const maxRetries = 10
|
||||
const retryDelay = 2 * time.Second
|
||||
|
||||
var conn *amqp.Connection
|
||||
var err error
|
||||
|
||||
for i := 0; i < maxRetries; i++ {
|
||||
// Attempt to dial RabbitMQ
|
||||
conn, err = amqp.Dial(cfg.URL())
|
||||
if err != nil {
|
||||
return nil, err
|
||||
logger.Warn("Failed to dial RabbitMQ, retrying...",
|
||||
zap.Int("attempt", i+1),
|
||||
zap.Int("max_retries", maxRetries),
|
||||
zap.String("url", maskPassword(cfg.URL())),
|
||||
zap.Error(err))
|
||||
time.Sleep(retryDelay * time.Duration(i+1))
|
||||
continue
|
||||
}
|
||||
|
||||
logger.Info("Connected to RabbitMQ")
|
||||
// Verify connection is actually usable by opening a channel
|
||||
ch, err := conn.Channel()
|
||||
if err != nil {
|
||||
logger.Warn("RabbitMQ connection established but channel creation failed, retrying...",
|
||||
zap.Int("attempt", i+1),
|
||||
zap.Int("max_retries", maxRetries),
|
||||
zap.Error(err))
|
||||
conn.Close()
|
||||
time.Sleep(retryDelay * time.Duration(i+1))
|
||||
continue
|
||||
}
|
||||
|
||||
// Test the channel with a simple operation (declare a test exchange)
|
||||
err = ch.ExchangeDeclare(
|
||||
"mpc.health.check", // name
|
||||
"fanout", // type
|
||||
false, // durable
|
||||
true, // auto-deleted
|
||||
false, // internal
|
||||
false, // no-wait
|
||||
nil, // arguments
|
||||
)
|
||||
if err != nil {
|
||||
logger.Warn("RabbitMQ channel created but exchange declaration failed, retrying...",
|
||||
zap.Int("attempt", i+1),
|
||||
zap.Int("max_retries", maxRetries),
|
||||
zap.Error(err))
|
||||
ch.Close()
|
||||
conn.Close()
|
||||
time.Sleep(retryDelay * time.Duration(i+1))
|
||||
continue
|
||||
}
|
||||
|
||||
// Clean up test exchange
|
||||
ch.ExchangeDelete("mpc.health.check", false, false)
|
||||
ch.Close()
|
||||
|
||||
// Setup connection close notification
|
||||
closeChan := make(chan *amqp.Error, 1)
|
||||
conn.NotifyClose(closeChan)
|
||||
go func() {
|
||||
err := <-closeChan
|
||||
if err != nil {
|
||||
logger.Error("RabbitMQ connection closed unexpectedly", zap.Error(err))
|
||||
}
|
||||
}()
|
||||
|
||||
logger.Info("Connected to RabbitMQ and verified connectivity",
|
||||
zap.Int("attempt", i+1))
|
||||
return conn, nil
|
||||
}
|
||||
|
||||
return nil, fmt.Errorf("failed to connect to RabbitMQ after %d retries: %w", maxRetries, err)
|
||||
}
|
||||
|
||||
// maskPassword masks the password in the RabbitMQ URL for logging
|
||||
func maskPassword(url string) string {
|
||||
// Simple masking: amqp://user:password@host:port -> amqp://user:****@host:port
|
||||
start := 0
|
||||
for i := 0; i < len(url); i++ {
|
||||
if url[i] == ':' && i > 0 && url[i-1] != '/' {
|
||||
start = i + 1
|
||||
break
|
||||
}
|
||||
}
|
||||
if start == 0 {
|
||||
return url
|
||||
}
|
||||
|
||||
end := start
|
||||
for i := start; i < len(url); i++ {
|
||||
if url[i] == '@' {
|
||||
end = i
|
||||
break
|
||||
}
|
||||
}
|
||||
if end == start {
|
||||
return url
|
||||
}
|
||||
|
||||
return url[:start] + "****" + url[end:]
|
||||
}
|
||||
|
||||
func startHTTPServer(
|
||||
cfg *config.Config,
|
||||
createAccountUC *use_cases.CreateAccountUseCase,
|
||||
|
|
|
|||
|
|
@ -4,9 +4,12 @@ import (
|
|||
"context"
|
||||
|
||||
pb "github.com/rwadurian/mpc-system/api/grpc/router/v1"
|
||||
"github.com/rwadurian/mpc-system/pkg/logger"
|
||||
"github.com/rwadurian/mpc-system/services/message-router/adapters/output/rabbitmq"
|
||||
"github.com/rwadurian/mpc-system/services/message-router/application/use_cases"
|
||||
"github.com/rwadurian/mpc-system/services/message-router/domain"
|
||||
"github.com/rwadurian/mpc-system/services/message-router/domain/entities"
|
||||
"go.uber.org/zap"
|
||||
"google.golang.org/grpc/codes"
|
||||
"google.golang.org/grpc/status"
|
||||
)
|
||||
|
|
@ -17,6 +20,8 @@ type MessageRouterServer struct {
|
|||
routeMessageUC *use_cases.RouteMessageUseCase
|
||||
getPendingMessagesUC *use_cases.GetPendingMessagesUseCase
|
||||
messageBroker *rabbitmq.MessageBrokerAdapter
|
||||
partyRegistry *domain.PartyRegistry
|
||||
eventBroadcaster *domain.SessionEventBroadcaster
|
||||
}
|
||||
|
||||
// NewMessageRouterServer creates a new gRPC server
|
||||
|
|
@ -24,11 +29,15 @@ func NewMessageRouterServer(
|
|||
routeMessageUC *use_cases.RouteMessageUseCase,
|
||||
getPendingMessagesUC *use_cases.GetPendingMessagesUseCase,
|
||||
messageBroker *rabbitmq.MessageBrokerAdapter,
|
||||
partyRegistry *domain.PartyRegistry,
|
||||
eventBroadcaster *domain.SessionEventBroadcaster,
|
||||
) *MessageRouterServer {
|
||||
return &MessageRouterServer{
|
||||
routeMessageUC: routeMessageUC,
|
||||
getPendingMessagesUC: getPendingMessagesUC,
|
||||
messageBroker: messageBroker,
|
||||
partyRegistry: partyRegistry,
|
||||
eventBroadcaster: eventBroadcaster,
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -134,6 +143,102 @@ func (s *MessageRouterServer) GetPendingMessages(
|
|||
}, nil
|
||||
}
|
||||
|
||||
// RegisterParty registers a party with the message router
|
||||
func (s *MessageRouterServer) RegisterParty(
|
||||
ctx context.Context,
|
||||
req *pb.RegisterPartyRequest,
|
||||
) (*pb.RegisterPartyResponse, error) {
|
||||
if req.PartyId == "" {
|
||||
return nil, status.Error(codes.InvalidArgument, "party_id is required")
|
||||
}
|
||||
|
||||
// Register party
|
||||
party := s.partyRegistry.Register(req.PartyId, req.PartyRole, req.Version)
|
||||
|
||||
logger.Info("Party registered",
|
||||
zap.String("party_id", req.PartyId),
|
||||
zap.String("role", req.PartyRole),
|
||||
zap.String("version", req.Version))
|
||||
|
||||
return &pb.RegisterPartyResponse{
|
||||
Success: true,
|
||||
Message: "Party registered successfully",
|
||||
RegisteredAt: party.RegisteredAt.UnixMilli(),
|
||||
}, nil
|
||||
}
|
||||
|
||||
// SubscribeSessionEvents subscribes to session lifecycle events (streaming)
|
||||
func (s *MessageRouterServer) SubscribeSessionEvents(
|
||||
req *pb.SubscribeSessionEventsRequest,
|
||||
stream pb.MessageRouter_SubscribeSessionEventsServer,
|
||||
) error {
|
||||
ctx := stream.Context()
|
||||
|
||||
if req.PartyId == "" {
|
||||
return status.Error(codes.InvalidArgument, "party_id is required")
|
||||
}
|
||||
|
||||
// Check if party is registered
|
||||
if _, exists := s.partyRegistry.Get(req.PartyId); !exists {
|
||||
return status.Error(codes.FailedPrecondition, "party not registered")
|
||||
}
|
||||
|
||||
logger.Info("Party subscribed to session events",
|
||||
zap.String("party_id", req.PartyId))
|
||||
|
||||
// Subscribe to events
|
||||
eventCh := s.eventBroadcaster.Subscribe(req.PartyId)
|
||||
defer s.eventBroadcaster.Unsubscribe(req.PartyId)
|
||||
|
||||
// Stream events
|
||||
for {
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
logger.Info("Party unsubscribed from session events",
|
||||
zap.String("party_id", req.PartyId))
|
||||
return nil
|
||||
|
||||
case event, ok := <-eventCh:
|
||||
if !ok {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Send event to party
|
||||
if err := stream.Send(event); err != nil {
|
||||
logger.Error("Failed to send session event",
|
||||
zap.String("party_id", req.PartyId),
|
||||
zap.Error(err))
|
||||
return err
|
||||
}
|
||||
|
||||
logger.Debug("Sent session event to party",
|
||||
zap.String("party_id", req.PartyId),
|
||||
zap.String("event_type", event.EventType),
|
||||
zap.String("session_id", event.SessionId))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// PublishSessionEvent publishes a session event to subscribed parties
|
||||
// This is called by Session Coordinator
|
||||
func (s *MessageRouterServer) PublishSessionEvent(event *pb.SessionEvent) {
|
||||
// If selected_parties is specified, send only to those parties
|
||||
if len(event.SelectedParties) > 0 {
|
||||
s.eventBroadcaster.BroadcastToParties(event, event.SelectedParties)
|
||||
logger.Info("Published session event to selected parties",
|
||||
zap.String("event_type", event.EventType),
|
||||
zap.String("session_id", event.SessionId),
|
||||
zap.Int("party_count", len(event.SelectedParties)))
|
||||
} else {
|
||||
// Broadcast to all subscribers
|
||||
s.eventBroadcaster.Broadcast(event)
|
||||
logger.Info("Broadcast session event to all parties",
|
||||
zap.String("event_type", event.EventType),
|
||||
zap.String("session_id", event.SessionId),
|
||||
zap.Int("subscriber_count", s.eventBroadcaster.SubscriberCount()))
|
||||
}
|
||||
}
|
||||
|
||||
func sendMessage(stream pb.MessageRouter_SubscribeMessagesServer, msg *entities.MessageDTO) error {
|
||||
protoMsg := &pb.MPCMessage{
|
||||
MessageId: msg.ID,
|
||||
|
|
|
|||
|
|
@ -25,6 +25,7 @@ import (
|
|||
"github.com/rwadurian/mpc-system/services/message-router/adapters/output/postgres"
|
||||
"github.com/rwadurian/mpc-system/services/message-router/adapters/output/rabbitmq"
|
||||
"github.com/rwadurian/mpc-system/services/message-router/application/use_cases"
|
||||
"github.com/rwadurian/mpc-system/services/message-router/domain"
|
||||
"go.uber.org/zap"
|
||||
)
|
||||
|
||||
|
|
@ -77,6 +78,10 @@ func main() {
|
|||
}
|
||||
defer messageBroker.Close()
|
||||
|
||||
// Initialize party registry and event broadcaster for party-driven architecture
|
||||
partyRegistry := domain.NewPartyRegistry()
|
||||
eventBroadcaster := domain.NewSessionEventBroadcaster()
|
||||
|
||||
// Initialize use cases
|
||||
routeMessageUC := use_cases.NewRouteMessageUseCase(messageRepo, messageBroker)
|
||||
getPendingMessagesUC := use_cases.NewGetPendingMessagesUseCase(messageRepo)
|
||||
|
|
@ -93,7 +98,7 @@ func main() {
|
|||
|
||||
// Start gRPC server
|
||||
go func() {
|
||||
if err := startGRPCServer(cfg, routeMessageUC, getPendingMessagesUC, messageBroker); err != nil {
|
||||
if err := startGRPCServer(cfg, routeMessageUC, getPendingMessagesUC, messageBroker, partyRegistry, eventBroadcaster); err != nil {
|
||||
errChan <- fmt.Errorf("gRPC server error: %w", err)
|
||||
}
|
||||
}()
|
||||
|
|
@ -148,6 +153,7 @@ func initDatabase(cfg config.DatabaseConfig) (*sql.DB, error) {
|
|||
db.SetMaxIdleConns(cfg.MaxIdleConns)
|
||||
db.SetConnMaxLifetime(cfg.ConnMaxLife)
|
||||
|
||||
// Test connection with Ping
|
||||
if err = db.Ping(); err != nil {
|
||||
logger.Warn("Failed to ping database, retrying...",
|
||||
zap.Int("attempt", i+1),
|
||||
|
|
@ -158,7 +164,23 @@ func initDatabase(cfg config.DatabaseConfig) (*sql.DB, error) {
|
|||
continue
|
||||
}
|
||||
|
||||
logger.Info("Connected to PostgreSQL")
|
||||
// Verify database is actually usable with a simple query
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
var result int
|
||||
err = db.QueryRowContext(ctx, "SELECT 1").Scan(&result)
|
||||
cancel()
|
||||
if err != nil {
|
||||
logger.Warn("Database ping succeeded but query failed, retrying...",
|
||||
zap.Int("attempt", i+1),
|
||||
zap.Int("max_retries", maxRetries),
|
||||
zap.Error(err))
|
||||
db.Close()
|
||||
time.Sleep(retryDelay * time.Duration(i+1))
|
||||
continue
|
||||
}
|
||||
|
||||
logger.Info("Connected to PostgreSQL and verified connectivity",
|
||||
zap.Int("attempt", i+1))
|
||||
return db, nil
|
||||
}
|
||||
|
||||
|
|
@ -173,28 +195,108 @@ func initRabbitMQ(cfg config.RabbitMQConfig) (*amqp.Connection, error) {
|
|||
var err error
|
||||
|
||||
for i := 0; i < maxRetries; i++ {
|
||||
// Attempt to dial RabbitMQ
|
||||
conn, err = amqp.Dial(cfg.URL())
|
||||
if err != nil {
|
||||
logger.Warn("Failed to connect to RabbitMQ, retrying...",
|
||||
logger.Warn("Failed to dial RabbitMQ, retrying...",
|
||||
zap.Int("attempt", i+1),
|
||||
zap.Int("max_retries", maxRetries),
|
||||
zap.String("url", maskPassword(cfg.URL())),
|
||||
zap.Error(err))
|
||||
time.Sleep(retryDelay * time.Duration(i+1))
|
||||
continue
|
||||
}
|
||||
|
||||
logger.Info("Connected to RabbitMQ")
|
||||
// Verify connection is actually usable by opening a channel
|
||||
ch, err := conn.Channel()
|
||||
if err != nil {
|
||||
logger.Warn("RabbitMQ connection established but channel creation failed, retrying...",
|
||||
zap.Int("attempt", i+1),
|
||||
zap.Int("max_retries", maxRetries),
|
||||
zap.Error(err))
|
||||
conn.Close()
|
||||
time.Sleep(retryDelay * time.Duration(i+1))
|
||||
continue
|
||||
}
|
||||
|
||||
// Test the channel with a simple operation (declare a test exchange)
|
||||
err = ch.ExchangeDeclare(
|
||||
"mpc.health.check", // name
|
||||
"fanout", // type
|
||||
false, // durable
|
||||
true, // auto-deleted
|
||||
false, // internal
|
||||
false, // no-wait
|
||||
nil, // arguments
|
||||
)
|
||||
if err != nil {
|
||||
logger.Warn("RabbitMQ channel created but exchange declaration failed, retrying...",
|
||||
zap.Int("attempt", i+1),
|
||||
zap.Int("max_retries", maxRetries),
|
||||
zap.Error(err))
|
||||
ch.Close()
|
||||
conn.Close()
|
||||
time.Sleep(retryDelay * time.Duration(i+1))
|
||||
continue
|
||||
}
|
||||
|
||||
// Clean up test exchange
|
||||
ch.ExchangeDelete("mpc.health.check", false, false)
|
||||
ch.Close()
|
||||
|
||||
// Setup connection close notification
|
||||
closeChan := make(chan *amqp.Error, 1)
|
||||
conn.NotifyClose(closeChan)
|
||||
go func() {
|
||||
err := <-closeChan
|
||||
if err != nil {
|
||||
logger.Error("RabbitMQ connection closed unexpectedly", zap.Error(err))
|
||||
}
|
||||
}()
|
||||
|
||||
logger.Info("Connected to RabbitMQ and verified connectivity",
|
||||
zap.Int("attempt", i+1))
|
||||
return conn, nil
|
||||
}
|
||||
|
||||
return nil, fmt.Errorf("failed to connect to RabbitMQ after %d retries: %w", maxRetries, err)
|
||||
}
|
||||
|
||||
// maskPassword masks the password in the RabbitMQ URL for logging
|
||||
func maskPassword(url string) string {
|
||||
// Simple masking: amqp://user:password@host:port -> amqp://user:****@host:port
|
||||
start := 0
|
||||
for i := 0; i < len(url); i++ {
|
||||
if url[i] == ':' && i > 0 && url[i-1] != '/' {
|
||||
start = i + 1
|
||||
break
|
||||
}
|
||||
}
|
||||
if start == 0 {
|
||||
return url
|
||||
}
|
||||
|
||||
end := start
|
||||
for i := start; i < len(url); i++ {
|
||||
if url[i] == '@' {
|
||||
end = i
|
||||
break
|
||||
}
|
||||
}
|
||||
if end == start {
|
||||
return url
|
||||
}
|
||||
|
||||
return url[:start] + "****" + url[end:]
|
||||
}
|
||||
|
||||
func startGRPCServer(
|
||||
cfg *config.Config,
|
||||
routeMessageUC *use_cases.RouteMessageUseCase,
|
||||
getPendingMessagesUC *use_cases.GetPendingMessagesUseCase,
|
||||
messageBroker *rabbitmq.MessageBrokerAdapter,
|
||||
partyRegistry *domain.PartyRegistry,
|
||||
eventBroadcaster *domain.SessionEventBroadcaster,
|
||||
) error {
|
||||
listener, err := net.Listen("tcp", fmt.Sprintf(":%d", cfg.Server.GRPCPort))
|
||||
if err != nil {
|
||||
|
|
@ -203,11 +305,13 @@ func startGRPCServer(
|
|||
|
||||
grpcServer := grpc.NewServer()
|
||||
|
||||
// Create and register the message router gRPC handler
|
||||
// Create and register the message router gRPC handler with party registry and event broadcaster
|
||||
messageRouterServer := grpcadapter.NewMessageRouterServer(
|
||||
routeMessageUC,
|
||||
getPendingMessagesUC,
|
||||
messageBroker,
|
||||
partyRegistry,
|
||||
eventBroadcaster,
|
||||
)
|
||||
pb.RegisterMessageRouterServer(grpcServer, messageRouterServer)
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,93 @@
|
|||
package domain
|
||||
|
||||
import (
|
||||
"sync"
|
||||
"time"
|
||||
)
|
||||
|
||||
// RegisteredParty represents a party registered with the router
|
||||
type RegisteredParty struct {
|
||||
PartyID string
|
||||
Role string // persistent, delegate, temporary
|
||||
Version string
|
||||
RegisteredAt time.Time
|
||||
LastSeen time.Time
|
||||
}
|
||||
|
||||
// PartyRegistry manages registered parties
|
||||
type PartyRegistry struct {
|
||||
parties map[string]*RegisteredParty
|
||||
mu sync.RWMutex
|
||||
}
|
||||
|
||||
// NewPartyRegistry creates a new party registry
|
||||
func NewPartyRegistry() *PartyRegistry {
|
||||
return &PartyRegistry{
|
||||
parties: make(map[string]*RegisteredParty),
|
||||
}
|
||||
}
|
||||
|
||||
// Register registers a party
|
||||
func (r *PartyRegistry) Register(partyID, role, version string) *RegisteredParty {
|
||||
r.mu.Lock()
|
||||
defer r.mu.Unlock()
|
||||
|
||||
now := time.Now()
|
||||
party := &RegisteredParty{
|
||||
PartyID: partyID,
|
||||
Role: role,
|
||||
Version: version,
|
||||
RegisteredAt: now,
|
||||
LastSeen: now,
|
||||
}
|
||||
|
||||
r.parties[partyID] = party
|
||||
return party
|
||||
}
|
||||
|
||||
// Get retrieves a registered party
|
||||
func (r *PartyRegistry) Get(partyID string) (*RegisteredParty, bool) {
|
||||
r.mu.RLock()
|
||||
defer r.mu.RUnlock()
|
||||
|
||||
party, exists := r.parties[partyID]
|
||||
return party, exists
|
||||
}
|
||||
|
||||
// GetAll returns all registered parties
|
||||
func (r *PartyRegistry) GetAll() []*RegisteredParty {
|
||||
r.mu.RLock()
|
||||
defer r.mu.RUnlock()
|
||||
|
||||
parties := make([]*RegisteredParty, 0, len(r.parties))
|
||||
for _, party := range r.parties {
|
||||
parties = append(parties, party)
|
||||
}
|
||||
return parties
|
||||
}
|
||||
|
||||
// UpdateLastSeen updates the last seen timestamp
|
||||
func (r *PartyRegistry) UpdateLastSeen(partyID string) {
|
||||
r.mu.Lock()
|
||||
defer r.mu.Unlock()
|
||||
|
||||
if party, exists := r.parties[partyID]; exists {
|
||||
party.LastSeen = time.Now()
|
||||
}
|
||||
}
|
||||
|
||||
// Unregister removes a party from the registry
|
||||
func (r *PartyRegistry) Unregister(partyID string) {
|
||||
r.mu.Lock()
|
||||
defer r.mu.Unlock()
|
||||
|
||||
delete(r.parties, partyID)
|
||||
}
|
||||
|
||||
// Count returns the number of registered parties
|
||||
func (r *PartyRegistry) Count() int {
|
||||
r.mu.RLock()
|
||||
defer r.mu.RUnlock()
|
||||
|
||||
return len(r.parties)
|
||||
}
|
||||
|
|
@ -0,0 +1,83 @@
|
|||
package domain
|
||||
|
||||
import (
|
||||
"sync"
|
||||
|
||||
pb "github.com/rwadurian/mpc-system/api/grpc/router/v1"
|
||||
)
|
||||
|
||||
// SessionEventBroadcaster manages session event subscriptions and broadcasting
|
||||
type SessionEventBroadcaster struct {
|
||||
subscribers map[string]chan *pb.SessionEvent // partyID -> event channel
|
||||
mu sync.RWMutex
|
||||
}
|
||||
|
||||
// NewSessionEventBroadcaster creates a new session event broadcaster
|
||||
func NewSessionEventBroadcaster() *SessionEventBroadcaster {
|
||||
return &SessionEventBroadcaster{
|
||||
subscribers: make(map[string]chan *pb.SessionEvent),
|
||||
}
|
||||
}
|
||||
|
||||
// Subscribe subscribes a party to session events
|
||||
func (b *SessionEventBroadcaster) Subscribe(partyID string) <-chan *pb.SessionEvent {
|
||||
b.mu.Lock()
|
||||
defer b.mu.Unlock()
|
||||
|
||||
// Create buffered channel for this subscriber
|
||||
ch := make(chan *pb.SessionEvent, 100)
|
||||
b.subscribers[partyID] = ch
|
||||
|
||||
return ch
|
||||
}
|
||||
|
||||
// Unsubscribe removes a party's subscription
|
||||
func (b *SessionEventBroadcaster) Unsubscribe(partyID string) {
|
||||
b.mu.Lock()
|
||||
defer b.mu.Unlock()
|
||||
|
||||
if ch, exists := b.subscribers[partyID]; exists {
|
||||
close(ch)
|
||||
delete(b.subscribers, partyID)
|
||||
}
|
||||
}
|
||||
|
||||
// Broadcast sends an event to all subscribers
|
||||
func (b *SessionEventBroadcaster) Broadcast(event *pb.SessionEvent) {
|
||||
b.mu.RLock()
|
||||
defer b.mu.RUnlock()
|
||||
|
||||
for _, ch := range b.subscribers {
|
||||
// Non-blocking send to prevent slow subscribers from blocking
|
||||
select {
|
||||
case ch <- event:
|
||||
default:
|
||||
// Channel full, skip this subscriber
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// BroadcastToParties sends an event to specific parties only
|
||||
func (b *SessionEventBroadcaster) BroadcastToParties(event *pb.SessionEvent, partyIDs []string) {
|
||||
b.mu.RLock()
|
||||
defer b.mu.RUnlock()
|
||||
|
||||
for _, partyID := range partyIDs {
|
||||
if ch, exists := b.subscribers[partyID]; exists {
|
||||
// Non-blocking send
|
||||
select {
|
||||
case ch <- event:
|
||||
default:
|
||||
// Channel full, skip this subscriber
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// SubscriberCount returns the number of active subscribers
|
||||
func (b *SessionEventBroadcaster) SubscriberCount() int {
|
||||
b.mu.RLock()
|
||||
defer b.mu.RUnlock()
|
||||
|
||||
return len(b.subscribers)
|
||||
}
|
||||
|
|
@ -146,23 +146,61 @@ func main() {
|
|||
}
|
||||
|
||||
func initDatabase(cfg config.DatabaseConfig) (*sql.DB, error) {
|
||||
db, err := sql.Open("postgres", cfg.DSN())
|
||||
const maxRetries = 10
|
||||
const retryDelay = 2 * time.Second
|
||||
|
||||
var db *sql.DB
|
||||
var err error
|
||||
|
||||
for i := 0; i < maxRetries; i++ {
|
||||
db, err = sql.Open("postgres", cfg.DSN())
|
||||
if err != nil {
|
||||
return nil, err
|
||||
logger.Warn("Failed to open database connection, retrying...",
|
||||
zap.Int("attempt", i+1),
|
||||
zap.Int("max_retries", maxRetries),
|
||||
zap.Error(err))
|
||||
time.Sleep(retryDelay * time.Duration(i+1))
|
||||
continue
|
||||
}
|
||||
|
||||
db.SetMaxOpenConns(cfg.MaxOpenConns)
|
||||
db.SetMaxIdleConns(cfg.MaxIdleConns)
|
||||
db.SetConnMaxLifetime(cfg.ConnMaxLife)
|
||||
|
||||
if err := db.Ping(); err != nil {
|
||||
return nil, err
|
||||
// Test connection with Ping
|
||||
if err = db.Ping(); err != nil {
|
||||
logger.Warn("Failed to ping database, retrying...",
|
||||
zap.Int("attempt", i+1),
|
||||
zap.Int("max_retries", maxRetries),
|
||||
zap.Error(err))
|
||||
db.Close()
|
||||
time.Sleep(retryDelay * time.Duration(i+1))
|
||||
continue
|
||||
}
|
||||
|
||||
logger.Info("Connected to PostgreSQL")
|
||||
// Verify database is actually usable with a simple query
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
var result int
|
||||
err = db.QueryRowContext(ctx, "SELECT 1").Scan(&result)
|
||||
cancel()
|
||||
if err != nil {
|
||||
logger.Warn("Database ping succeeded but query failed, retrying...",
|
||||
zap.Int("attempt", i+1),
|
||||
zap.Int("max_retries", maxRetries),
|
||||
zap.Error(err))
|
||||
db.Close()
|
||||
time.Sleep(retryDelay * time.Duration(i+1))
|
||||
continue
|
||||
}
|
||||
|
||||
logger.Info("Connected to PostgreSQL and verified connectivity",
|
||||
zap.Int("attempt", i+1))
|
||||
return db, nil
|
||||
}
|
||||
|
||||
return nil, fmt.Errorf("failed to connect to database after %d retries: %w", maxRetries, err)
|
||||
}
|
||||
|
||||
func startHTTPServer(
|
||||
cfg *config.Config,
|
||||
participateKeygenUC *use_cases.ParticipateKeygenUseCase,
|
||||
|
|
|
|||
|
|
@ -21,12 +21,34 @@ type EventPublisherAdapter struct {
|
|||
|
||||
// NewEventPublisherAdapter creates a new RabbitMQ event publisher
|
||||
func NewEventPublisherAdapter(conn *amqp.Connection) (*EventPublisherAdapter, error) {
|
||||
// Verify connection is not nil and not closed
|
||||
if conn == nil {
|
||||
return nil, fmt.Errorf("rabbitmq connection is nil")
|
||||
}
|
||||
if conn.IsClosed() {
|
||||
return nil, fmt.Errorf("rabbitmq connection is already closed")
|
||||
}
|
||||
|
||||
// Create channel with detailed error logging
|
||||
channel, err := conn.Channel()
|
||||
if err != nil {
|
||||
logger.Error("failed to create RabbitMQ channel", zap.Error(err))
|
||||
return nil, fmt.Errorf("failed to create channel: %w", err)
|
||||
}
|
||||
|
||||
// Declare exchange for MPC events
|
||||
// Set channel QoS for better flow control
|
||||
err = channel.Qos(
|
||||
100, // prefetch count
|
||||
0, // prefetch size
|
||||
false, // global
|
||||
)
|
||||
if err != nil {
|
||||
logger.Warn("failed to set channel QoS, continuing anyway", zap.Error(err))
|
||||
// Don't fail on QoS error, it's not critical
|
||||
}
|
||||
|
||||
// Declare exchange for MPC events with detailed logging
|
||||
logger.Info("declaring RabbitMQ exchange for MPC events")
|
||||
err = channel.ExchangeDeclare(
|
||||
"mpc.events", // name
|
||||
"topic", // type
|
||||
|
|
@ -37,10 +59,14 @@ func NewEventPublisherAdapter(conn *amqp.Connection) (*EventPublisherAdapter, er
|
|||
nil, // arguments
|
||||
)
|
||||
if err != nil {
|
||||
logger.Error("failed to declare mpc.events exchange", zap.Error(err))
|
||||
channel.Close()
|
||||
return nil, fmt.Errorf("failed to declare exchange: %w", err)
|
||||
}
|
||||
logger.Info("successfully declared mpc.events exchange")
|
||||
|
||||
// Declare exchange for party messages
|
||||
// Declare exchange for party messages with detailed logging
|
||||
logger.Info("declaring RabbitMQ exchange for party messages")
|
||||
err = channel.ExchangeDeclare(
|
||||
"mpc.messages", // name
|
||||
"direct", // type
|
||||
|
|
@ -51,9 +77,23 @@ func NewEventPublisherAdapter(conn *amqp.Connection) (*EventPublisherAdapter, er
|
|||
nil, // arguments
|
||||
)
|
||||
if err != nil {
|
||||
logger.Error("failed to declare mpc.messages exchange", zap.Error(err))
|
||||
channel.Close()
|
||||
return nil, fmt.Errorf("failed to declare messages exchange: %w", err)
|
||||
}
|
||||
logger.Info("successfully declared mpc.messages exchange")
|
||||
|
||||
// Setup channel close notification
|
||||
closeChan := make(chan *amqp.Error, 1)
|
||||
channel.NotifyClose(closeChan)
|
||||
go func() {
|
||||
err := <-closeChan
|
||||
if err != nil {
|
||||
logger.Error("RabbitMQ channel closed unexpectedly", zap.Error(err))
|
||||
}
|
||||
}()
|
||||
|
||||
logger.Info("EventPublisherAdapter initialized successfully")
|
||||
return &EventPublisherAdapter{
|
||||
conn: conn,
|
||||
channel: channel,
|
||||
|
|
|
|||
|
|
@ -14,9 +14,10 @@ const (
|
|||
PartyRoleTemporary PartyRole = "temporary"
|
||||
)
|
||||
|
||||
// PartyEndpoint represents a party endpoint from the pool
|
||||
// PartyEndpoint represents a party from the pool
|
||||
// Note: Address is removed - parties connect to Message Router themselves
|
||||
// Session Coordinator only needs PartyID for message routing
|
||||
type PartyEndpoint struct {
|
||||
Address string
|
||||
PartyID string
|
||||
Ready bool
|
||||
Role PartyRole // Role of the party (persistent, delegate, temporary)
|
||||
|
|
|
|||
|
|
@ -255,23 +255,101 @@ func initRabbitMQ(cfg config.RabbitMQConfig) (*amqp.Connection, error) {
|
|||
var err error
|
||||
|
||||
for i := 0; i < maxRetries; i++ {
|
||||
// Attempt to dial RabbitMQ
|
||||
conn, err = amqp.Dial(cfg.URL())
|
||||
if err != nil {
|
||||
logger.Warn("Failed to connect to RabbitMQ, retrying...",
|
||||
logger.Warn("Failed to dial RabbitMQ, retrying...",
|
||||
zap.Int("attempt", i+1),
|
||||
zap.Int("max_retries", maxRetries),
|
||||
zap.String("url", maskPassword(cfg.URL())),
|
||||
zap.Error(err))
|
||||
time.Sleep(retryDelay * time.Duration(i+1))
|
||||
continue
|
||||
}
|
||||
|
||||
logger.Info("Connected to RabbitMQ")
|
||||
// Verify connection is actually usable by opening a channel
|
||||
ch, err := conn.Channel()
|
||||
if err != nil {
|
||||
logger.Warn("RabbitMQ connection established but channel creation failed, retrying...",
|
||||
zap.Int("attempt", i+1),
|
||||
zap.Int("max_retries", maxRetries),
|
||||
zap.Error(err))
|
||||
conn.Close()
|
||||
time.Sleep(retryDelay * time.Duration(i+1))
|
||||
continue
|
||||
}
|
||||
|
||||
// Test the channel with a simple operation (declare a test exchange)
|
||||
err = ch.ExchangeDeclare(
|
||||
"mpc.health.check", // name
|
||||
"fanout", // type
|
||||
false, // durable
|
||||
true, // auto-deleted
|
||||
false, // internal
|
||||
false, // no-wait
|
||||
nil, // arguments
|
||||
)
|
||||
if err != nil {
|
||||
logger.Warn("RabbitMQ channel created but exchange declaration failed, retrying...",
|
||||
zap.Int("attempt", i+1),
|
||||
zap.Int("max_retries", maxRetries),
|
||||
zap.Error(err))
|
||||
ch.Close()
|
||||
conn.Close()
|
||||
time.Sleep(retryDelay * time.Duration(i+1))
|
||||
continue
|
||||
}
|
||||
|
||||
// Clean up test exchange
|
||||
ch.ExchangeDelete("mpc.health.check", false, false)
|
||||
ch.Close()
|
||||
|
||||
// Setup connection close notification
|
||||
closeChan := make(chan *amqp.Error, 1)
|
||||
conn.NotifyClose(closeChan)
|
||||
go func() {
|
||||
err := <-closeChan
|
||||
if err != nil {
|
||||
logger.Error("RabbitMQ connection closed unexpectedly", zap.Error(err))
|
||||
}
|
||||
}()
|
||||
|
||||
logger.Info("Connected to RabbitMQ and verified connectivity",
|
||||
zap.Int("attempt", i+1))
|
||||
return conn, nil
|
||||
}
|
||||
|
||||
return nil, fmt.Errorf("failed to connect to RabbitMQ after %d retries: %w", maxRetries, err)
|
||||
}
|
||||
|
||||
// maskPassword masks the password in the RabbitMQ URL for logging
|
||||
func maskPassword(url string) string {
|
||||
// Simple masking: amqp://user:password@host:port -> amqp://user:****@host:port
|
||||
start := 0
|
||||
for i := 0; i < len(url); i++ {
|
||||
if url[i] == ':' && i > 0 && url[i-1] != '/' {
|
||||
start = i + 1
|
||||
break
|
||||
}
|
||||
}
|
||||
if start == 0 {
|
||||
return url
|
||||
}
|
||||
|
||||
end := start
|
||||
for i := start; i < len(url); i++ {
|
||||
if url[i] == '@' {
|
||||
end = i
|
||||
break
|
||||
}
|
||||
}
|
||||
if end == start {
|
||||
return url
|
||||
}
|
||||
|
||||
return url[:start] + "****" + url[end:]
|
||||
}
|
||||
|
||||
func startGRPCServer(
|
||||
cfg *config.Config,
|
||||
createSessionUC *use_cases.CreateSessionUseCase,
|
||||
|
|
|
|||
|
|
@ -15,9 +15,9 @@ import (
|
|||
"k8s.io/client-go/tools/clientcmd"
|
||||
)
|
||||
|
||||
// PartyEndpoint represents a discovered party endpoint
|
||||
// PartyEndpoint represents a discovered party
|
||||
// Note: Address removed - parties connect to Message Router themselves
|
||||
type PartyEndpoint struct {
|
||||
Address string
|
||||
PodName string
|
||||
Ready bool
|
||||
Role output.PartyRole // Party role extracted from pod labels
|
||||
|
|
@ -113,7 +113,6 @@ func (pd *PartyDiscovery) GetAvailableParties() []output.PartyEndpoint {
|
|||
for _, ep := range pd.endpoints {
|
||||
if ep.Ready {
|
||||
available = append(available, output.PartyEndpoint{
|
||||
Address: ep.Address,
|
||||
PartyID: ep.PodName, // Use pod name as party ID
|
||||
Ready: ep.Ready,
|
||||
Role: ep.Role,
|
||||
|
|
@ -134,7 +133,6 @@ func (pd *PartyDiscovery) GetAvailablePartiesByRole(role output.PartyRole) []out
|
|||
for _, ep := range pd.endpoints {
|
||||
if ep.Ready && ep.Role == role {
|
||||
available = append(available, output.PartyEndpoint{
|
||||
Address: ep.Address,
|
||||
PartyID: ep.PodName,
|
||||
Ready: ep.Ready,
|
||||
Role: ep.Role,
|
||||
|
|
@ -216,22 +214,13 @@ func (pd *PartyDiscovery) refresh() error {
|
|||
role = output.PartyRole(roleLabel)
|
||||
}
|
||||
|
||||
// Get pod IP
|
||||
if pod.Status.PodIP != "" {
|
||||
// Assuming gRPC port is 50051 (should be configurable)
|
||||
grpcPort := os.Getenv("MPC_PARTY_GRPC_PORT")
|
||||
if grpcPort == "" {
|
||||
grpcPort = "50051"
|
||||
}
|
||||
|
||||
// Add party to pool (no address needed - parties connect to Message Router)
|
||||
endpoints = append(endpoints, PartyEndpoint{
|
||||
Address: fmt.Sprintf("%s:%s", pod.Status.PodIP, grpcPort),
|
||||
PodName: pod.Name,
|
||||
Ready: ready,
|
||||
Role: role,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
pd.mu.Lock()
|
||||
pd.endpoints = endpoints
|
||||
|
|
|
|||
Loading…
Reference in New Issue