refactor(mpc-system): migrate to party-driven architecture with PartyID-based routing
- Remove Address field from PartyEndpoint (parties connect to router themselves) - Update K8s Discovery to only manage PartyID and Role labels - Add Party registration and SessionEvent protobuf definitions - Implement PartyRegistry and SessionEventBroadcaster domain logic - Add RegisterParty and SubscribeSessionEvents gRPC handlers - Prepare infrastructure for party-driven MPC coordination This is the first phase of migrating from coordinator-driven to party-driven architecture following international MPC system design patterns.
This commit is contained in:
parent
e975e9d86c
commit
747e4ae8ef
|
|
@ -31,7 +31,14 @@
|
||||||
"Bash(wsl.exe -- bash -c 'find ~/rwadurian/backend/mpc-system/services/server-party -name \"\"main.go\"\" -path \"\"*/cmd/server/*\"\"')",
|
"Bash(wsl.exe -- bash -c 'find ~/rwadurian/backend/mpc-system/services/server-party -name \"\"main.go\"\" -path \"\"*/cmd/server/*\"\"')",
|
||||||
"Bash(wsl.exe -- bash -c 'cat ~/rwadurian/backend/mpc-system/services/server-party/cmd/server/main.go | grep -E \"\"grpc|GRPC|gRPC|50051\"\" | head -20')",
|
"Bash(wsl.exe -- bash -c 'cat ~/rwadurian/backend/mpc-system/services/server-party/cmd/server/main.go | grep -E \"\"grpc|GRPC|gRPC|50051\"\" | head -20')",
|
||||||
"Bash(wsl.exe -- bash:*)",
|
"Bash(wsl.exe -- bash:*)",
|
||||||
"Bash(dir:*)"
|
"Bash(dir:*)",
|
||||||
|
"Bash(go version:*)",
|
||||||
|
"Bash(go mod download:*)",
|
||||||
|
"Bash(go build:*)",
|
||||||
|
"Bash(go mod tidy:*)",
|
||||||
|
"Bash(findstr:*)",
|
||||||
|
"Bash(del \"c:\\Users\\dong\\Desktop\\rwadurian\\backend\\mpc-system\\PARTY_ROLE_VERIFICATION_REPORT.md\")",
|
||||||
|
"Bash(protoc:*)"
|
||||||
],
|
],
|
||||||
"deny": [],
|
"deny": [],
|
||||||
"ask": []
|
"ask": []
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,295 @@
|
||||||
|
# Party Role Labels Implementation - Verification Report
|
||||||
|
|
||||||
|
**Date**: 2025-12-05
|
||||||
|
**Commit**: e975e9d - "feat(mpc-system): implement party role labels with strict persistent-only default"
|
||||||
|
**Environment**: Docker Compose (Local Development)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Implementation Summary
|
||||||
|
|
||||||
|
### 1.1 Overview
|
||||||
|
Implemented Party Role Labels (Solution 1) to differentiate between three types of server parties:
|
||||||
|
- **Persistent**: Stores key shares in database permanently
|
||||||
|
- **Delegate**: Generates user shares and returns them to caller (doesn't store)
|
||||||
|
- **Temporary**: For ad-hoc operations
|
||||||
|
|
||||||
|
### 1.2 Core Changes
|
||||||
|
|
||||||
|
#### Files Modified
|
||||||
|
1. `services/session-coordinator/application/ports/output/party_pool_port.go`
|
||||||
|
- Added `PartyRole` enum (persistent, delegate, temporary)
|
||||||
|
- Added `PartyEndpoint.Role` field
|
||||||
|
- Added `PartySelectionFilter` struct with role filtering
|
||||||
|
- Added `SelectPartiesWithFilter()` and `GetAvailablePartiesByRole()` methods
|
||||||
|
|
||||||
|
2. `services/session-coordinator/infrastructure/k8s/party_discovery.go`
|
||||||
|
- Implemented role extraction from K8s pod labels (`party-role`)
|
||||||
|
- Implemented `GetAvailablePartiesByRole()` for role-based filtering
|
||||||
|
- Implemented `SelectPartiesWithFilter()` with role and count requirements
|
||||||
|
- Default role: `persistent` if label not found
|
||||||
|
|
||||||
|
3. `services/session-coordinator/application/ports/input/session_management_port.go`
|
||||||
|
- Added `PartyComposition` struct with role-based party counts
|
||||||
|
- Added optional `PartyComposition` field to `CreateSessionInput`
|
||||||
|
|
||||||
|
4. `services/session-coordinator/application/use_cases/create_session.go`
|
||||||
|
- Implemented strict persistent-only default policy (lines 102-114)
|
||||||
|
- Implemented `selectPartiesByComposition()` method with empty composition validation (lines 224-284)
|
||||||
|
- Added clear error messages for insufficient parties
|
||||||
|
|
||||||
|
5. `k8s/server-party-deployment.yaml`
|
||||||
|
- Added label: `party-role: persistent` (line 25)
|
||||||
|
|
||||||
|
6. `k8s/server-party-api-deployment.yaml` (NEW FILE)
|
||||||
|
- New deployment for delegate parties
|
||||||
|
- Added label: `party-role: delegate` (line 25)
|
||||||
|
- Replicas: 2 (for generating user shares)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Security Policy Implementation
|
||||||
|
|
||||||
|
### 2.1 Strict Persistent-Only Default
|
||||||
|
When `PartyComposition` is **nil** (not specified):
|
||||||
|
- System MUST select only `persistent` parties
|
||||||
|
- If insufficient persistent parties available → **Fail immediately with clear error**
|
||||||
|
- NO automatic fallback to delegate/temporary parties
|
||||||
|
- Error message: "insufficient persistent parties: need N persistent parties but not enough available. Use PartyComposition to specify custom party requirements"
|
||||||
|
|
||||||
|
**Code Reference**: [create_session.go:102-114](c:\Users\dong\Desktop\rwadurian\backend\mpc-system\services\session-coordinator\application\use_cases\create_session.go#L102-L114)
|
||||||
|
|
||||||
|
### 2.2 Empty PartyComposition Validation
|
||||||
|
When `PartyComposition` is specified but all counts are 0:
|
||||||
|
- System returns error: "PartyComposition specified but no parties selected: all counts are zero and no custom filters provided"
|
||||||
|
- Prevents accidental bypass of persistent-only requirement
|
||||||
|
|
||||||
|
**Code Reference**: [create_session.go:279-281](c:\Users\dong\Desktop\rwadurian\backend\mpc-system\services\session-coordinator\application\use_cases\create_session.go#L279-L281)
|
||||||
|
|
||||||
|
### 2.3 Threshold Security Guarantee
|
||||||
|
- Default policy ensures MPC threshold security by using only persistent parties
|
||||||
|
- Persistent parties store shares in database, ensuring T-of-N shares are always available for future sign operations
|
||||||
|
- Delegate parties (which don't store shares) are only used when explicitly specified via `PartyComposition`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Docker Compose Deployment Verification
|
||||||
|
|
||||||
|
### 3.1 Build Status
|
||||||
|
**Command**: `./deploy.sh build`
|
||||||
|
**Status**: ✅ SUCCESS
|
||||||
|
**Images Built**:
|
||||||
|
1. mpc-system-postgres (postgres:15-alpine)
|
||||||
|
2. mpc-system-rabbitmq (rabbitmq:3-management-alpine)
|
||||||
|
3. mpc-system-redis (redis:7-alpine)
|
||||||
|
4. mpc-system-session-coordinator
|
||||||
|
5. mpc-system-message-router
|
||||||
|
6. mpc-system-server-party-1/2/3
|
||||||
|
7. mpc-system-server-party-api
|
||||||
|
8. mpc-system-account-service
|
||||||
|
|
||||||
|
### 3.2 Deployment Status
|
||||||
|
**Command**: `./deploy.sh up`
|
||||||
|
**Status**: ✅ SUCCESS
|
||||||
|
**Services Running** (10 containers):
|
||||||
|
|
||||||
|
| Service | Status | Ports | Notes |
|
||||||
|
|---------|--------|-------|-------|
|
||||||
|
| mpc-postgres | Healthy | 5432 (internal) | PostgreSQL 15 |
|
||||||
|
| mpc-rabbitmq | Healthy | 5672, 15672 (internal) | Message broker |
|
||||||
|
| mpc-redis | Healthy | 6379 (internal) | Cache store |
|
||||||
|
| mpc-session-coordinator | Healthy | 8081:8080 | Core orchestration |
|
||||||
|
| mpc-message-router | Healthy | 8082:8080 | Message routing |
|
||||||
|
| mpc-server-party-1 | Healthy | 50051, 8080 (internal) | Persistent party |
|
||||||
|
| mpc-server-party-2 | Healthy | 50051, 8080 (internal) | Persistent party |
|
||||||
|
| mpc-server-party-3 | Healthy | 50051, 8080 (internal) | Persistent party |
|
||||||
|
| mpc-server-party-api | Healthy | 8083:8080 | Delegate party |
|
||||||
|
| mpc-account-service | Healthy | 4000:8080 | Application service |
|
||||||
|
|
||||||
|
### 3.3 Health Check Results
|
||||||
|
```bash
|
||||||
|
# Session Coordinator
|
||||||
|
$ curl http://localhost:8081/health
|
||||||
|
{"service":"session-coordinator","status":"healthy"}
|
||||||
|
|
||||||
|
# Account Service
|
||||||
|
$ curl http://localhost:4000/health
|
||||||
|
{"service":"account","status":"healthy"}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Status**: ✅ All services responding to health checks
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Known Limitations in Docker Compose Environment
|
||||||
|
|
||||||
|
### 4.1 K8s Party Discovery Not Available
|
||||||
|
**Log Message**:
|
||||||
|
```
|
||||||
|
{"level":"warn","message":"K8s party discovery not available, will use dynamic join mode",
|
||||||
|
"error":"failed to create k8s config: stat /home/mpc/.kube/config: no such file or directory"}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Impact**:
|
||||||
|
- Party role labels (`party-role`) from K8s deployments are not accessible in Docker Compose
|
||||||
|
- System falls back to dynamic join mode (universal join tokens)
|
||||||
|
- `PartyPoolPort` is not available, so `selectPartiesByComposition()` logic is not exercised
|
||||||
|
|
||||||
|
**Why This Happens**:
|
||||||
|
- Docker Compose doesn't provide K8s API access
|
||||||
|
- Party discovery requires K8s Service Discovery and pod label queries
|
||||||
|
- This is expected behavior for non-K8s environments
|
||||||
|
|
||||||
|
### 4.2 Party Role Labels Not Testable in Docker Compose
|
||||||
|
The following features cannot be tested in Docker Compose:
|
||||||
|
1. Role-based party filtering (`SelectPartiesWithFilter`)
|
||||||
|
2. `PartyComposition`-based party selection
|
||||||
|
3. Strict persistent-only default policy
|
||||||
|
4. K8s pod label reading (`party-role`)
|
||||||
|
|
||||||
|
**These features require actual Kubernetes deployment to test.**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. What Was Verified
|
||||||
|
|
||||||
|
### 5.1 Code Compilation ✅
|
||||||
|
- All modified Go files compile successfully
|
||||||
|
- No syntax errors or type errors
|
||||||
|
- Build completes on both Windows (local) and WSL (Docker)
|
||||||
|
|
||||||
|
### 5.2 Service Deployment ✅
|
||||||
|
- All 10 services start successfully
|
||||||
|
- All health checks pass
|
||||||
|
- Services can connect to each other (gRPC connectivity verified in logs)
|
||||||
|
- Database connections established
|
||||||
|
- Message broker connections established
|
||||||
|
|
||||||
|
### 5.3 Code Logic Review ✅
|
||||||
|
- Strict persistent-only default policy correctly implemented
|
||||||
|
- Empty `PartyComposition` validation prevents loophole
|
||||||
|
- Clear error messages for insufficient parties
|
||||||
|
- Role extraction from K8s pod labels correctly implemented
|
||||||
|
- Role-based filtering logic correct
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. What Cannot Be Verified Without K8s
|
||||||
|
|
||||||
|
### 6.1 Runtime Behavior
|
||||||
|
1. **Party Discovery**: K8s pod label queries
|
||||||
|
2. **Role Filtering**: Actual filtering by `party-role` label values
|
||||||
|
3. **Persistent-Only Policy**: Enforcement when persistent parties insufficient
|
||||||
|
4. **Error Messages**: Actual error messages when party selection fails
|
||||||
|
5. **PartyComposition**: Custom party mix selection
|
||||||
|
|
||||||
|
### 6.2 Integration Testing
|
||||||
|
1. Creating a session with default (nil) `PartyComposition` → should select only persistent parties
|
||||||
|
2. Creating a session with insufficient persistent parties → should return clear error
|
||||||
|
3. Creating a session with empty `PartyComposition` → should return validation error
|
||||||
|
4. Creating a session with custom `PartyComposition` → should select correct party mix
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Next Steps for Full Verification
|
||||||
|
|
||||||
|
### 7.1 Deploy to Kubernetes Cluster
|
||||||
|
To fully test Party Role Labels, deploy to actual K8s cluster:
|
||||||
|
```bash
|
||||||
|
# Apply K8s manifests
|
||||||
|
kubectl apply -f k8s/namespace.yaml
|
||||||
|
kubectl apply -f k8s/configmap.yaml
|
||||||
|
kubectl apply -f k8s/secrets.yaml
|
||||||
|
kubectl apply -f k8s/postgres-deployment.yaml
|
||||||
|
kubectl apply -f k8s/rabbitmq-deployment.yaml
|
||||||
|
kubectl apply -f k8s/redis-deployment.yaml
|
||||||
|
kubectl apply -f k8s/server-party-deployment.yaml
|
||||||
|
kubectl apply -f k8s/server-party-api-deployment.yaml
|
||||||
|
kubectl apply -f k8s/session-coordinator-deployment.yaml
|
||||||
|
kubectl apply -f k8s/message-router-deployment.yaml
|
||||||
|
kubectl apply -f k8s/account-service-deployment.yaml
|
||||||
|
|
||||||
|
# Verify party discovery works
|
||||||
|
kubectl logs -n mpc-system -l app=mpc-session-coordinator | grep -i "party\|role\|discovery"
|
||||||
|
|
||||||
|
# Verify pod labels are set
|
||||||
|
kubectl get pods -n mpc-system -l app=mpc-server-party -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.labels.party-role}{"\n"}{end}'
|
||||||
|
kubectl get pods -n mpc-system -l app=mpc-server-party-api -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.labels.party-role}{"\n"}{end}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### 7.2 Integration Testing in K8s
|
||||||
|
1. **Test Default Persistent-Only Selection**:
|
||||||
|
```bash
|
||||||
|
curl -X POST http://<account-service>/api/v1/accounts \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"user_id": "test-user-1"}'
|
||||||
|
|
||||||
|
# Expected: Session created with 3 persistent parties
|
||||||
|
# Check logs: kubectl logs -n mpc-system -l app=mpc-session-coordinator | grep "selected persistent parties by default"
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Test Insufficient Persistent Parties Error**:
|
||||||
|
```bash
|
||||||
|
# Scale down persistent parties to 2
|
||||||
|
kubectl scale deployment mpc-server-party -n mpc-system --replicas=2
|
||||||
|
|
||||||
|
# Try creating session requiring 3 parties
|
||||||
|
curl -X POST http://<account-service>/api/v1/accounts \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"user_id": "test-user-2"}'
|
||||||
|
|
||||||
|
# Expected: HTTP 500 error with message "insufficient persistent parties: need 3 persistent parties but not enough available"
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Test Empty PartyComposition Validation**:
|
||||||
|
- Requires API endpoint that accepts `PartyComposition` parameter
|
||||||
|
- Send request with `PartyComposition: {PersistentCount: 0, DelegateCount: 0, TemporaryCount: 0}`
|
||||||
|
- Expected: HTTP 400 error with message "PartyComposition specified but no parties selected"
|
||||||
|
|
||||||
|
4. **Test Custom PartyComposition**:
|
||||||
|
- Send request with `PartyComposition: {PersistentCount: 2, DelegateCount: 1}`
|
||||||
|
- Expected: Session created with 2 persistent + 1 delegate party
|
||||||
|
- Verify party roles in session data
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Conclusion
|
||||||
|
|
||||||
|
### 8.1 Implementation Status: ✅ COMPLETE
|
||||||
|
- All code changes implemented correctly
|
||||||
|
- Strict persistent-only default policy enforced
|
||||||
|
- Empty `PartyComposition` validation prevents loophole
|
||||||
|
- Clear error messages for insufficient parties
|
||||||
|
- Backward compatibility maintained (optional `PartyComposition`)
|
||||||
|
|
||||||
|
### 8.2 Deployment Status: ✅ SUCCESS (Docker Compose)
|
||||||
|
- All services build successfully
|
||||||
|
- All services deploy successfully
|
||||||
|
- All services healthy and responding
|
||||||
|
- Inter-service connectivity verified
|
||||||
|
|
||||||
|
### 8.3 Verification Status: ⚠️ PARTIAL
|
||||||
|
- ✅ Code compilation and logic review
|
||||||
|
- ✅ Docker Compose deployment
|
||||||
|
- ✅ Service health checks
|
||||||
|
- ❌ Party role filtering runtime behavior (requires K8s)
|
||||||
|
- ❌ Persistent-only policy enforcement (requires K8s)
|
||||||
|
- ❌ Integration testing (requires K8s)
|
||||||
|
|
||||||
|
### 8.4 Readiness for Production
|
||||||
|
**Code Readiness**: ✅ READY
|
||||||
|
**Testing Readiness**: ⚠️ REQUIRES K8S DEPLOYMENT FOR FULL TESTING
|
||||||
|
**Deployment Readiness**: ✅ READY (K8s manifests prepared)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. User Confirmation Required
|
||||||
|
|
||||||
|
The Party Role Labels implementation is complete and successfully deployed in Docker Compose. However, full runtime verification requires deploying to an actual Kubernetes cluster.
|
||||||
|
|
||||||
|
**Options**:
|
||||||
|
1. Proceed with K8s deployment for full verification
|
||||||
|
2. Accept partial verification (code review + Docker Compose deployment)
|
||||||
|
3. Create integration tests that mock K8s party discovery
|
||||||
|
|
||||||
|
Awaiting user instruction on next steps.
|
||||||
|
|
@ -14,6 +14,12 @@ service MessageRouter {
|
||||||
|
|
||||||
// GetPendingMessages retrieves pending messages (polling alternative)
|
// GetPendingMessages retrieves pending messages (polling alternative)
|
||||||
rpc GetPendingMessages(GetPendingMessagesRequest) returns (GetPendingMessagesResponse);
|
rpc GetPendingMessages(GetPendingMessagesRequest) returns (GetPendingMessagesResponse);
|
||||||
|
|
||||||
|
// RegisterParty registers a party with the message router (party actively connects)
|
||||||
|
rpc RegisterParty(RegisterPartyRequest) returns (RegisterPartyResponse);
|
||||||
|
|
||||||
|
// SubscribeSessionEvents subscribes to session lifecycle events (session start, etc.)
|
||||||
|
rpc SubscribeSessionEvents(SubscribeSessionEventsRequest) returns (stream SessionEvent);
|
||||||
}
|
}
|
||||||
|
|
||||||
// RouteMessageRequest routes an MPC message
|
// RouteMessageRequest routes an MPC message
|
||||||
|
|
@ -61,3 +67,37 @@ message GetPendingMessagesRequest {
|
||||||
message GetPendingMessagesResponse {
|
message GetPendingMessagesResponse {
|
||||||
repeated MPCMessage messages = 1;
|
repeated MPCMessage messages = 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// RegisterPartyRequest registers a party with the router
|
||||||
|
message RegisterPartyRequest {
|
||||||
|
string party_id = 1; // Unique party identifier
|
||||||
|
string party_role = 2; // persistent, delegate, or temporary
|
||||||
|
string version = 3; // Party software version
|
||||||
|
}
|
||||||
|
|
||||||
|
// RegisterPartyResponse confirms party registration
|
||||||
|
message RegisterPartyResponse {
|
||||||
|
bool success = 1;
|
||||||
|
string message = 2;
|
||||||
|
int64 registered_at = 3; // Unix timestamp milliseconds
|
||||||
|
}
|
||||||
|
|
||||||
|
// SubscribeSessionEventsRequest subscribes to session events
|
||||||
|
message SubscribeSessionEventsRequest {
|
||||||
|
string party_id = 1; // Party ID subscribing to events
|
||||||
|
repeated string event_types = 2; // Event types to subscribe (empty = all)
|
||||||
|
}
|
||||||
|
|
||||||
|
// SessionEvent represents a session lifecycle event
|
||||||
|
message SessionEvent {
|
||||||
|
string event_id = 1;
|
||||||
|
string event_type = 2; // session_created, session_started, etc.
|
||||||
|
string session_id = 3;
|
||||||
|
int32 threshold_n = 4;
|
||||||
|
int32 threshold_t = 5;
|
||||||
|
repeated string selected_parties = 6; // PartyIDs selected for this session
|
||||||
|
map<string, string> join_tokens = 7; // PartyID -> JoinToken mapping
|
||||||
|
bytes message_hash = 8; // For sign sessions
|
||||||
|
int64 created_at = 9; // Unix timestamp milliseconds
|
||||||
|
int64 expires_at = 10; // Unix timestamp milliseconds
|
||||||
|
}
|
||||||
|
|
|
||||||
|
|
@ -174,50 +174,191 @@ func main() {
|
||||||
}
|
}
|
||||||
|
|
||||||
func initDatabase(cfg config.DatabaseConfig) (*sql.DB, error) {
|
func initDatabase(cfg config.DatabaseConfig) (*sql.DB, error) {
|
||||||
db, err := sql.Open("postgres", cfg.DSN())
|
const maxRetries = 10
|
||||||
if err != nil {
|
const retryDelay = 2 * time.Second
|
||||||
return nil, err
|
|
||||||
|
var db *sql.DB
|
||||||
|
var err error
|
||||||
|
|
||||||
|
for i := 0; i < maxRetries; i++ {
|
||||||
|
db, err = sql.Open("postgres", cfg.DSN())
|
||||||
|
if err != nil {
|
||||||
|
logger.Warn("Failed to open database connection, retrying...",
|
||||||
|
zap.Int("attempt", i+1),
|
||||||
|
zap.Int("max_retries", maxRetries),
|
||||||
|
zap.Error(err))
|
||||||
|
time.Sleep(retryDelay * time.Duration(i+1))
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
db.SetMaxOpenConns(cfg.MaxOpenConns)
|
||||||
|
db.SetMaxIdleConns(cfg.MaxIdleConns)
|
||||||
|
db.SetConnMaxLifetime(cfg.ConnMaxLife)
|
||||||
|
|
||||||
|
// Test connection with Ping
|
||||||
|
if err = db.Ping(); err != nil {
|
||||||
|
logger.Warn("Failed to ping database, retrying...",
|
||||||
|
zap.Int("attempt", i+1),
|
||||||
|
zap.Int("max_retries", maxRetries),
|
||||||
|
zap.Error(err))
|
||||||
|
db.Close()
|
||||||
|
time.Sleep(retryDelay * time.Duration(i+1))
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify database is actually usable with a simple query
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||||
|
var result int
|
||||||
|
err = db.QueryRowContext(ctx, "SELECT 1").Scan(&result)
|
||||||
|
cancel()
|
||||||
|
if err != nil {
|
||||||
|
logger.Warn("Database ping succeeded but query failed, retrying...",
|
||||||
|
zap.Int("attempt", i+1),
|
||||||
|
zap.Int("max_retries", maxRetries),
|
||||||
|
zap.Error(err))
|
||||||
|
db.Close()
|
||||||
|
time.Sleep(retryDelay * time.Duration(i+1))
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
logger.Info("Connected to PostgreSQL and verified connectivity",
|
||||||
|
zap.Int("attempt", i+1))
|
||||||
|
return db, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
db.SetMaxOpenConns(cfg.MaxOpenConns)
|
return nil, fmt.Errorf("failed to connect to database after %d retries: %w", maxRetries, err)
|
||||||
db.SetMaxIdleConns(cfg.MaxIdleConns)
|
|
||||||
db.SetConnMaxLifetime(cfg.ConnMaxLife)
|
|
||||||
|
|
||||||
// Test connection
|
|
||||||
if err := db.Ping(); err != nil {
|
|
||||||
return nil, err
|
|
||||||
}
|
|
||||||
|
|
||||||
logger.Info("Connected to PostgreSQL")
|
|
||||||
return db, nil
|
|
||||||
}
|
}
|
||||||
|
|
||||||
func initRedis(cfg config.RedisConfig) *redis.Client {
|
func initRedis(cfg config.RedisConfig) *redis.Client {
|
||||||
|
const maxRetries = 10
|
||||||
|
const retryDelay = 2 * time.Second
|
||||||
|
|
||||||
client := redis.NewClient(&redis.Options{
|
client := redis.NewClient(&redis.Options{
|
||||||
Addr: cfg.Addr(),
|
Addr: cfg.Addr(),
|
||||||
Password: cfg.Password,
|
Password: cfg.Password,
|
||||||
DB: cfg.DB,
|
DB: cfg.DB,
|
||||||
})
|
})
|
||||||
|
|
||||||
// Test connection
|
// Test connection with retry
|
||||||
ctx := context.Background()
|
ctx := context.Background()
|
||||||
if err := client.Ping(ctx).Err(); err != nil {
|
for i := 0; i < maxRetries; i++ {
|
||||||
logger.Warn("Redis connection failed, continuing without cache", zap.Error(err))
|
if err := client.Ping(ctx).Err(); err != nil {
|
||||||
} else {
|
logger.Warn("Redis connection failed, retrying...",
|
||||||
|
zap.Int("attempt", i+1),
|
||||||
|
zap.Int("max_retries", maxRetries),
|
||||||
|
zap.Error(err))
|
||||||
|
time.Sleep(retryDelay * time.Duration(i+1))
|
||||||
|
continue
|
||||||
|
}
|
||||||
logger.Info("Connected to Redis")
|
logger.Info("Connected to Redis")
|
||||||
|
return client
|
||||||
}
|
}
|
||||||
|
|
||||||
|
logger.Warn("Redis connection failed after retries, continuing without cache")
|
||||||
return client
|
return client
|
||||||
}
|
}
|
||||||
|
|
||||||
func initRabbitMQ(cfg config.RabbitMQConfig) (*amqp.Connection, error) {
|
func initRabbitMQ(cfg config.RabbitMQConfig) (*amqp.Connection, error) {
|
||||||
conn, err := amqp.Dial(cfg.URL())
|
const maxRetries = 10
|
||||||
if err != nil {
|
const retryDelay = 2 * time.Second
|
||||||
return nil, err
|
|
||||||
|
var conn *amqp.Connection
|
||||||
|
var err error
|
||||||
|
|
||||||
|
for i := 0; i < maxRetries; i++ {
|
||||||
|
// Attempt to dial RabbitMQ
|
||||||
|
conn, err = amqp.Dial(cfg.URL())
|
||||||
|
if err != nil {
|
||||||
|
logger.Warn("Failed to dial RabbitMQ, retrying...",
|
||||||
|
zap.Int("attempt", i+1),
|
||||||
|
zap.Int("max_retries", maxRetries),
|
||||||
|
zap.String("url", maskPassword(cfg.URL())),
|
||||||
|
zap.Error(err))
|
||||||
|
time.Sleep(retryDelay * time.Duration(i+1))
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify connection is actually usable by opening a channel
|
||||||
|
ch, err := conn.Channel()
|
||||||
|
if err != nil {
|
||||||
|
logger.Warn("RabbitMQ connection established but channel creation failed, retrying...",
|
||||||
|
zap.Int("attempt", i+1),
|
||||||
|
zap.Int("max_retries", maxRetries),
|
||||||
|
zap.Error(err))
|
||||||
|
conn.Close()
|
||||||
|
time.Sleep(retryDelay * time.Duration(i+1))
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test the channel with a simple operation (declare a test exchange)
|
||||||
|
err = ch.ExchangeDeclare(
|
||||||
|
"mpc.health.check", // name
|
||||||
|
"fanout", // type
|
||||||
|
false, // durable
|
||||||
|
true, // auto-deleted
|
||||||
|
false, // internal
|
||||||
|
false, // no-wait
|
||||||
|
nil, // arguments
|
||||||
|
)
|
||||||
|
if err != nil {
|
||||||
|
logger.Warn("RabbitMQ channel created but exchange declaration failed, retrying...",
|
||||||
|
zap.Int("attempt", i+1),
|
||||||
|
zap.Int("max_retries", maxRetries),
|
||||||
|
zap.Error(err))
|
||||||
|
ch.Close()
|
||||||
|
conn.Close()
|
||||||
|
time.Sleep(retryDelay * time.Duration(i+1))
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Clean up test exchange
|
||||||
|
ch.ExchangeDelete("mpc.health.check", false, false)
|
||||||
|
ch.Close()
|
||||||
|
|
||||||
|
// Setup connection close notification
|
||||||
|
closeChan := make(chan *amqp.Error, 1)
|
||||||
|
conn.NotifyClose(closeChan)
|
||||||
|
go func() {
|
||||||
|
err := <-closeChan
|
||||||
|
if err != nil {
|
||||||
|
logger.Error("RabbitMQ connection closed unexpectedly", zap.Error(err))
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
logger.Info("Connected to RabbitMQ and verified connectivity",
|
||||||
|
zap.Int("attempt", i+1))
|
||||||
|
return conn, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
logger.Info("Connected to RabbitMQ")
|
return nil, fmt.Errorf("failed to connect to RabbitMQ after %d retries: %w", maxRetries, err)
|
||||||
return conn, nil
|
}
|
||||||
|
|
||||||
|
// maskPassword masks the password in the RabbitMQ URL for logging
|
||||||
|
func maskPassword(url string) string {
|
||||||
|
// Simple masking: amqp://user:password@host:port -> amqp://user:****@host:port
|
||||||
|
start := 0
|
||||||
|
for i := 0; i < len(url); i++ {
|
||||||
|
if url[i] == ':' && i > 0 && url[i-1] != '/' {
|
||||||
|
start = i + 1
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if start == 0 {
|
||||||
|
return url
|
||||||
|
}
|
||||||
|
|
||||||
|
end := start
|
||||||
|
for i := start; i < len(url); i++ {
|
||||||
|
if url[i] == '@' {
|
||||||
|
end = i
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if end == start {
|
||||||
|
return url
|
||||||
|
}
|
||||||
|
|
||||||
|
return url[:start] + "****" + url[end:]
|
||||||
}
|
}
|
||||||
|
|
||||||
func startHTTPServer(
|
func startHTTPServer(
|
||||||
|
|
|
||||||
|
|
@ -4,9 +4,12 @@ import (
|
||||||
"context"
|
"context"
|
||||||
|
|
||||||
pb "github.com/rwadurian/mpc-system/api/grpc/router/v1"
|
pb "github.com/rwadurian/mpc-system/api/grpc/router/v1"
|
||||||
|
"github.com/rwadurian/mpc-system/pkg/logger"
|
||||||
"github.com/rwadurian/mpc-system/services/message-router/adapters/output/rabbitmq"
|
"github.com/rwadurian/mpc-system/services/message-router/adapters/output/rabbitmq"
|
||||||
"github.com/rwadurian/mpc-system/services/message-router/application/use_cases"
|
"github.com/rwadurian/mpc-system/services/message-router/application/use_cases"
|
||||||
|
"github.com/rwadurian/mpc-system/services/message-router/domain"
|
||||||
"github.com/rwadurian/mpc-system/services/message-router/domain/entities"
|
"github.com/rwadurian/mpc-system/services/message-router/domain/entities"
|
||||||
|
"go.uber.org/zap"
|
||||||
"google.golang.org/grpc/codes"
|
"google.golang.org/grpc/codes"
|
||||||
"google.golang.org/grpc/status"
|
"google.golang.org/grpc/status"
|
||||||
)
|
)
|
||||||
|
|
@ -17,6 +20,8 @@ type MessageRouterServer struct {
|
||||||
routeMessageUC *use_cases.RouteMessageUseCase
|
routeMessageUC *use_cases.RouteMessageUseCase
|
||||||
getPendingMessagesUC *use_cases.GetPendingMessagesUseCase
|
getPendingMessagesUC *use_cases.GetPendingMessagesUseCase
|
||||||
messageBroker *rabbitmq.MessageBrokerAdapter
|
messageBroker *rabbitmq.MessageBrokerAdapter
|
||||||
|
partyRegistry *domain.PartyRegistry
|
||||||
|
eventBroadcaster *domain.SessionEventBroadcaster
|
||||||
}
|
}
|
||||||
|
|
||||||
// NewMessageRouterServer creates a new gRPC server
|
// NewMessageRouterServer creates a new gRPC server
|
||||||
|
|
@ -24,11 +29,15 @@ func NewMessageRouterServer(
|
||||||
routeMessageUC *use_cases.RouteMessageUseCase,
|
routeMessageUC *use_cases.RouteMessageUseCase,
|
||||||
getPendingMessagesUC *use_cases.GetPendingMessagesUseCase,
|
getPendingMessagesUC *use_cases.GetPendingMessagesUseCase,
|
||||||
messageBroker *rabbitmq.MessageBrokerAdapter,
|
messageBroker *rabbitmq.MessageBrokerAdapter,
|
||||||
|
partyRegistry *domain.PartyRegistry,
|
||||||
|
eventBroadcaster *domain.SessionEventBroadcaster,
|
||||||
) *MessageRouterServer {
|
) *MessageRouterServer {
|
||||||
return &MessageRouterServer{
|
return &MessageRouterServer{
|
||||||
routeMessageUC: routeMessageUC,
|
routeMessageUC: routeMessageUC,
|
||||||
getPendingMessagesUC: getPendingMessagesUC,
|
getPendingMessagesUC: getPendingMessagesUC,
|
||||||
messageBroker: messageBroker,
|
messageBroker: messageBroker,
|
||||||
|
partyRegistry: partyRegistry,
|
||||||
|
eventBroadcaster: eventBroadcaster,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -134,6 +143,102 @@ func (s *MessageRouterServer) GetPendingMessages(
|
||||||
}, nil
|
}, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// RegisterParty registers a party with the message router
|
||||||
|
func (s *MessageRouterServer) RegisterParty(
|
||||||
|
ctx context.Context,
|
||||||
|
req *pb.RegisterPartyRequest,
|
||||||
|
) (*pb.RegisterPartyResponse, error) {
|
||||||
|
if req.PartyId == "" {
|
||||||
|
return nil, status.Error(codes.InvalidArgument, "party_id is required")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Register party
|
||||||
|
party := s.partyRegistry.Register(req.PartyId, req.PartyRole, req.Version)
|
||||||
|
|
||||||
|
logger.Info("Party registered",
|
||||||
|
zap.String("party_id", req.PartyId),
|
||||||
|
zap.String("role", req.PartyRole),
|
||||||
|
zap.String("version", req.Version))
|
||||||
|
|
||||||
|
return &pb.RegisterPartyResponse{
|
||||||
|
Success: true,
|
||||||
|
Message: "Party registered successfully",
|
||||||
|
RegisteredAt: party.RegisteredAt.UnixMilli(),
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// SubscribeSessionEvents subscribes to session lifecycle events (streaming)
|
||||||
|
func (s *MessageRouterServer) SubscribeSessionEvents(
|
||||||
|
req *pb.SubscribeSessionEventsRequest,
|
||||||
|
stream pb.MessageRouter_SubscribeSessionEventsServer,
|
||||||
|
) error {
|
||||||
|
ctx := stream.Context()
|
||||||
|
|
||||||
|
if req.PartyId == "" {
|
||||||
|
return status.Error(codes.InvalidArgument, "party_id is required")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check if party is registered
|
||||||
|
if _, exists := s.partyRegistry.Get(req.PartyId); !exists {
|
||||||
|
return status.Error(codes.FailedPrecondition, "party not registered")
|
||||||
|
}
|
||||||
|
|
||||||
|
logger.Info("Party subscribed to session events",
|
||||||
|
zap.String("party_id", req.PartyId))
|
||||||
|
|
||||||
|
// Subscribe to events
|
||||||
|
eventCh := s.eventBroadcaster.Subscribe(req.PartyId)
|
||||||
|
defer s.eventBroadcaster.Unsubscribe(req.PartyId)
|
||||||
|
|
||||||
|
// Stream events
|
||||||
|
for {
|
||||||
|
select {
|
||||||
|
case <-ctx.Done():
|
||||||
|
logger.Info("Party unsubscribed from session events",
|
||||||
|
zap.String("party_id", req.PartyId))
|
||||||
|
return nil
|
||||||
|
|
||||||
|
case event, ok := <-eventCh:
|
||||||
|
if !ok {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Send event to party
|
||||||
|
if err := stream.Send(event); err != nil {
|
||||||
|
logger.Error("Failed to send session event",
|
||||||
|
zap.String("party_id", req.PartyId),
|
||||||
|
zap.Error(err))
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
logger.Debug("Sent session event to party",
|
||||||
|
zap.String("party_id", req.PartyId),
|
||||||
|
zap.String("event_type", event.EventType),
|
||||||
|
zap.String("session_id", event.SessionId))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// PublishSessionEvent publishes a session event to subscribed parties
|
||||||
|
// This is called by Session Coordinator
|
||||||
|
func (s *MessageRouterServer) PublishSessionEvent(event *pb.SessionEvent) {
|
||||||
|
// If selected_parties is specified, send only to those parties
|
||||||
|
if len(event.SelectedParties) > 0 {
|
||||||
|
s.eventBroadcaster.BroadcastToParties(event, event.SelectedParties)
|
||||||
|
logger.Info("Published session event to selected parties",
|
||||||
|
zap.String("event_type", event.EventType),
|
||||||
|
zap.String("session_id", event.SessionId),
|
||||||
|
zap.Int("party_count", len(event.SelectedParties)))
|
||||||
|
} else {
|
||||||
|
// Broadcast to all subscribers
|
||||||
|
s.eventBroadcaster.Broadcast(event)
|
||||||
|
logger.Info("Broadcast session event to all parties",
|
||||||
|
zap.String("event_type", event.EventType),
|
||||||
|
zap.String("session_id", event.SessionId),
|
||||||
|
zap.Int("subscriber_count", s.eventBroadcaster.SubscriberCount()))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
func sendMessage(stream pb.MessageRouter_SubscribeMessagesServer, msg *entities.MessageDTO) error {
|
func sendMessage(stream pb.MessageRouter_SubscribeMessagesServer, msg *entities.MessageDTO) error {
|
||||||
protoMsg := &pb.MPCMessage{
|
protoMsg := &pb.MPCMessage{
|
||||||
MessageId: msg.ID,
|
MessageId: msg.ID,
|
||||||
|
|
|
||||||
|
|
@ -25,6 +25,7 @@ import (
|
||||||
"github.com/rwadurian/mpc-system/services/message-router/adapters/output/postgres"
|
"github.com/rwadurian/mpc-system/services/message-router/adapters/output/postgres"
|
||||||
"github.com/rwadurian/mpc-system/services/message-router/adapters/output/rabbitmq"
|
"github.com/rwadurian/mpc-system/services/message-router/adapters/output/rabbitmq"
|
||||||
"github.com/rwadurian/mpc-system/services/message-router/application/use_cases"
|
"github.com/rwadurian/mpc-system/services/message-router/application/use_cases"
|
||||||
|
"github.com/rwadurian/mpc-system/services/message-router/domain"
|
||||||
"go.uber.org/zap"
|
"go.uber.org/zap"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
@ -77,6 +78,10 @@ func main() {
|
||||||
}
|
}
|
||||||
defer messageBroker.Close()
|
defer messageBroker.Close()
|
||||||
|
|
||||||
|
// Initialize party registry and event broadcaster for party-driven architecture
|
||||||
|
partyRegistry := domain.NewPartyRegistry()
|
||||||
|
eventBroadcaster := domain.NewSessionEventBroadcaster()
|
||||||
|
|
||||||
// Initialize use cases
|
// Initialize use cases
|
||||||
routeMessageUC := use_cases.NewRouteMessageUseCase(messageRepo, messageBroker)
|
routeMessageUC := use_cases.NewRouteMessageUseCase(messageRepo, messageBroker)
|
||||||
getPendingMessagesUC := use_cases.NewGetPendingMessagesUseCase(messageRepo)
|
getPendingMessagesUC := use_cases.NewGetPendingMessagesUseCase(messageRepo)
|
||||||
|
|
@ -93,7 +98,7 @@ func main() {
|
||||||
|
|
||||||
// Start gRPC server
|
// Start gRPC server
|
||||||
go func() {
|
go func() {
|
||||||
if err := startGRPCServer(cfg, routeMessageUC, getPendingMessagesUC, messageBroker); err != nil {
|
if err := startGRPCServer(cfg, routeMessageUC, getPendingMessagesUC, messageBroker, partyRegistry, eventBroadcaster); err != nil {
|
||||||
errChan <- fmt.Errorf("gRPC server error: %w", err)
|
errChan <- fmt.Errorf("gRPC server error: %w", err)
|
||||||
}
|
}
|
||||||
}()
|
}()
|
||||||
|
|
@ -148,6 +153,7 @@ func initDatabase(cfg config.DatabaseConfig) (*sql.DB, error) {
|
||||||
db.SetMaxIdleConns(cfg.MaxIdleConns)
|
db.SetMaxIdleConns(cfg.MaxIdleConns)
|
||||||
db.SetConnMaxLifetime(cfg.ConnMaxLife)
|
db.SetConnMaxLifetime(cfg.ConnMaxLife)
|
||||||
|
|
||||||
|
// Test connection with Ping
|
||||||
if err = db.Ping(); err != nil {
|
if err = db.Ping(); err != nil {
|
||||||
logger.Warn("Failed to ping database, retrying...",
|
logger.Warn("Failed to ping database, retrying...",
|
||||||
zap.Int("attempt", i+1),
|
zap.Int("attempt", i+1),
|
||||||
|
|
@ -158,7 +164,23 @@ func initDatabase(cfg config.DatabaseConfig) (*sql.DB, error) {
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
|
|
||||||
logger.Info("Connected to PostgreSQL")
|
// Verify database is actually usable with a simple query
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||||
|
var result int
|
||||||
|
err = db.QueryRowContext(ctx, "SELECT 1").Scan(&result)
|
||||||
|
cancel()
|
||||||
|
if err != nil {
|
||||||
|
logger.Warn("Database ping succeeded but query failed, retrying...",
|
||||||
|
zap.Int("attempt", i+1),
|
||||||
|
zap.Int("max_retries", maxRetries),
|
||||||
|
zap.Error(err))
|
||||||
|
db.Close()
|
||||||
|
time.Sleep(retryDelay * time.Duration(i+1))
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
logger.Info("Connected to PostgreSQL and verified connectivity",
|
||||||
|
zap.Int("attempt", i+1))
|
||||||
return db, nil
|
return db, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -173,28 +195,108 @@ func initRabbitMQ(cfg config.RabbitMQConfig) (*amqp.Connection, error) {
|
||||||
var err error
|
var err error
|
||||||
|
|
||||||
for i := 0; i < maxRetries; i++ {
|
for i := 0; i < maxRetries; i++ {
|
||||||
|
// Attempt to dial RabbitMQ
|
||||||
conn, err = amqp.Dial(cfg.URL())
|
conn, err = amqp.Dial(cfg.URL())
|
||||||
if err != nil {
|
if err != nil {
|
||||||
logger.Warn("Failed to connect to RabbitMQ, retrying...",
|
logger.Warn("Failed to dial RabbitMQ, retrying...",
|
||||||
zap.Int("attempt", i+1),
|
zap.Int("attempt", i+1),
|
||||||
zap.Int("max_retries", maxRetries),
|
zap.Int("max_retries", maxRetries),
|
||||||
|
zap.String("url", maskPassword(cfg.URL())),
|
||||||
zap.Error(err))
|
zap.Error(err))
|
||||||
time.Sleep(retryDelay * time.Duration(i+1))
|
time.Sleep(retryDelay * time.Duration(i+1))
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
|
|
||||||
logger.Info("Connected to RabbitMQ")
|
// Verify connection is actually usable by opening a channel
|
||||||
|
ch, err := conn.Channel()
|
||||||
|
if err != nil {
|
||||||
|
logger.Warn("RabbitMQ connection established but channel creation failed, retrying...",
|
||||||
|
zap.Int("attempt", i+1),
|
||||||
|
zap.Int("max_retries", maxRetries),
|
||||||
|
zap.Error(err))
|
||||||
|
conn.Close()
|
||||||
|
time.Sleep(retryDelay * time.Duration(i+1))
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test the channel with a simple operation (declare a test exchange)
|
||||||
|
err = ch.ExchangeDeclare(
|
||||||
|
"mpc.health.check", // name
|
||||||
|
"fanout", // type
|
||||||
|
false, // durable
|
||||||
|
true, // auto-deleted
|
||||||
|
false, // internal
|
||||||
|
false, // no-wait
|
||||||
|
nil, // arguments
|
||||||
|
)
|
||||||
|
if err != nil {
|
||||||
|
logger.Warn("RabbitMQ channel created but exchange declaration failed, retrying...",
|
||||||
|
zap.Int("attempt", i+1),
|
||||||
|
zap.Int("max_retries", maxRetries),
|
||||||
|
zap.Error(err))
|
||||||
|
ch.Close()
|
||||||
|
conn.Close()
|
||||||
|
time.Sleep(retryDelay * time.Duration(i+1))
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Clean up test exchange
|
||||||
|
ch.ExchangeDelete("mpc.health.check", false, false)
|
||||||
|
ch.Close()
|
||||||
|
|
||||||
|
// Setup connection close notification
|
||||||
|
closeChan := make(chan *amqp.Error, 1)
|
||||||
|
conn.NotifyClose(closeChan)
|
||||||
|
go func() {
|
||||||
|
err := <-closeChan
|
||||||
|
if err != nil {
|
||||||
|
logger.Error("RabbitMQ connection closed unexpectedly", zap.Error(err))
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
logger.Info("Connected to RabbitMQ and verified connectivity",
|
||||||
|
zap.Int("attempt", i+1))
|
||||||
return conn, nil
|
return conn, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
return nil, fmt.Errorf("failed to connect to RabbitMQ after %d retries: %w", maxRetries, err)
|
return nil, fmt.Errorf("failed to connect to RabbitMQ after %d retries: %w", maxRetries, err)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// maskPassword masks the password in the RabbitMQ URL for logging
|
||||||
|
func maskPassword(url string) string {
|
||||||
|
// Simple masking: amqp://user:password@host:port -> amqp://user:****@host:port
|
||||||
|
start := 0
|
||||||
|
for i := 0; i < len(url); i++ {
|
||||||
|
if url[i] == ':' && i > 0 && url[i-1] != '/' {
|
||||||
|
start = i + 1
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if start == 0 {
|
||||||
|
return url
|
||||||
|
}
|
||||||
|
|
||||||
|
end := start
|
||||||
|
for i := start; i < len(url); i++ {
|
||||||
|
if url[i] == '@' {
|
||||||
|
end = i
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if end == start {
|
||||||
|
return url
|
||||||
|
}
|
||||||
|
|
||||||
|
return url[:start] + "****" + url[end:]
|
||||||
|
}
|
||||||
|
|
||||||
func startGRPCServer(
|
func startGRPCServer(
|
||||||
cfg *config.Config,
|
cfg *config.Config,
|
||||||
routeMessageUC *use_cases.RouteMessageUseCase,
|
routeMessageUC *use_cases.RouteMessageUseCase,
|
||||||
getPendingMessagesUC *use_cases.GetPendingMessagesUseCase,
|
getPendingMessagesUC *use_cases.GetPendingMessagesUseCase,
|
||||||
messageBroker *rabbitmq.MessageBrokerAdapter,
|
messageBroker *rabbitmq.MessageBrokerAdapter,
|
||||||
|
partyRegistry *domain.PartyRegistry,
|
||||||
|
eventBroadcaster *domain.SessionEventBroadcaster,
|
||||||
) error {
|
) error {
|
||||||
listener, err := net.Listen("tcp", fmt.Sprintf(":%d", cfg.Server.GRPCPort))
|
listener, err := net.Listen("tcp", fmt.Sprintf(":%d", cfg.Server.GRPCPort))
|
||||||
if err != nil {
|
if err != nil {
|
||||||
|
|
@ -203,11 +305,13 @@ func startGRPCServer(
|
||||||
|
|
||||||
grpcServer := grpc.NewServer()
|
grpcServer := grpc.NewServer()
|
||||||
|
|
||||||
// Create and register the message router gRPC handler
|
// Create and register the message router gRPC handler with party registry and event broadcaster
|
||||||
messageRouterServer := grpcadapter.NewMessageRouterServer(
|
messageRouterServer := grpcadapter.NewMessageRouterServer(
|
||||||
routeMessageUC,
|
routeMessageUC,
|
||||||
getPendingMessagesUC,
|
getPendingMessagesUC,
|
||||||
messageBroker,
|
messageBroker,
|
||||||
|
partyRegistry,
|
||||||
|
eventBroadcaster,
|
||||||
)
|
)
|
||||||
pb.RegisterMessageRouterServer(grpcServer, messageRouterServer)
|
pb.RegisterMessageRouterServer(grpcServer, messageRouterServer)
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,93 @@
|
||||||
|
package domain
|
||||||
|
|
||||||
|
import (
|
||||||
|
"sync"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
// RegisteredParty represents a party registered with the router
|
||||||
|
type RegisteredParty struct {
|
||||||
|
PartyID string
|
||||||
|
Role string // persistent, delegate, temporary
|
||||||
|
Version string
|
||||||
|
RegisteredAt time.Time
|
||||||
|
LastSeen time.Time
|
||||||
|
}
|
||||||
|
|
||||||
|
// PartyRegistry manages registered parties
|
||||||
|
type PartyRegistry struct {
|
||||||
|
parties map[string]*RegisteredParty
|
||||||
|
mu sync.RWMutex
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewPartyRegistry creates a new party registry
|
||||||
|
func NewPartyRegistry() *PartyRegistry {
|
||||||
|
return &PartyRegistry{
|
||||||
|
parties: make(map[string]*RegisteredParty),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Register registers a party
|
||||||
|
func (r *PartyRegistry) Register(partyID, role, version string) *RegisteredParty {
|
||||||
|
r.mu.Lock()
|
||||||
|
defer r.mu.Unlock()
|
||||||
|
|
||||||
|
now := time.Now()
|
||||||
|
party := &RegisteredParty{
|
||||||
|
PartyID: partyID,
|
||||||
|
Role: role,
|
||||||
|
Version: version,
|
||||||
|
RegisteredAt: now,
|
||||||
|
LastSeen: now,
|
||||||
|
}
|
||||||
|
|
||||||
|
r.parties[partyID] = party
|
||||||
|
return party
|
||||||
|
}
|
||||||
|
|
||||||
|
// Get retrieves a registered party
|
||||||
|
func (r *PartyRegistry) Get(partyID string) (*RegisteredParty, bool) {
|
||||||
|
r.mu.RLock()
|
||||||
|
defer r.mu.RUnlock()
|
||||||
|
|
||||||
|
party, exists := r.parties[partyID]
|
||||||
|
return party, exists
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetAll returns all registered parties
|
||||||
|
func (r *PartyRegistry) GetAll() []*RegisteredParty {
|
||||||
|
r.mu.RLock()
|
||||||
|
defer r.mu.RUnlock()
|
||||||
|
|
||||||
|
parties := make([]*RegisteredParty, 0, len(r.parties))
|
||||||
|
for _, party := range r.parties {
|
||||||
|
parties = append(parties, party)
|
||||||
|
}
|
||||||
|
return parties
|
||||||
|
}
|
||||||
|
|
||||||
|
// UpdateLastSeen updates the last seen timestamp
|
||||||
|
func (r *PartyRegistry) UpdateLastSeen(partyID string) {
|
||||||
|
r.mu.Lock()
|
||||||
|
defer r.mu.Unlock()
|
||||||
|
|
||||||
|
if party, exists := r.parties[partyID]; exists {
|
||||||
|
party.LastSeen = time.Now()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Unregister removes a party from the registry
|
||||||
|
func (r *PartyRegistry) Unregister(partyID string) {
|
||||||
|
r.mu.Lock()
|
||||||
|
defer r.mu.Unlock()
|
||||||
|
|
||||||
|
delete(r.parties, partyID)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Count returns the number of registered parties
|
||||||
|
func (r *PartyRegistry) Count() int {
|
||||||
|
r.mu.RLock()
|
||||||
|
defer r.mu.RUnlock()
|
||||||
|
|
||||||
|
return len(r.parties)
|
||||||
|
}
|
||||||
|
|
@ -0,0 +1,83 @@
|
||||||
|
package domain
|
||||||
|
|
||||||
|
import (
|
||||||
|
"sync"
|
||||||
|
|
||||||
|
pb "github.com/rwadurian/mpc-system/api/grpc/router/v1"
|
||||||
|
)
|
||||||
|
|
||||||
|
// SessionEventBroadcaster manages session event subscriptions and broadcasting
|
||||||
|
type SessionEventBroadcaster struct {
|
||||||
|
subscribers map[string]chan *pb.SessionEvent // partyID -> event channel
|
||||||
|
mu sync.RWMutex
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewSessionEventBroadcaster creates a new session event broadcaster
|
||||||
|
func NewSessionEventBroadcaster() *SessionEventBroadcaster {
|
||||||
|
return &SessionEventBroadcaster{
|
||||||
|
subscribers: make(map[string]chan *pb.SessionEvent),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Subscribe subscribes a party to session events
|
||||||
|
func (b *SessionEventBroadcaster) Subscribe(partyID string) <-chan *pb.SessionEvent {
|
||||||
|
b.mu.Lock()
|
||||||
|
defer b.mu.Unlock()
|
||||||
|
|
||||||
|
// Create buffered channel for this subscriber
|
||||||
|
ch := make(chan *pb.SessionEvent, 100)
|
||||||
|
b.subscribers[partyID] = ch
|
||||||
|
|
||||||
|
return ch
|
||||||
|
}
|
||||||
|
|
||||||
|
// Unsubscribe removes a party's subscription
|
||||||
|
func (b *SessionEventBroadcaster) Unsubscribe(partyID string) {
|
||||||
|
b.mu.Lock()
|
||||||
|
defer b.mu.Unlock()
|
||||||
|
|
||||||
|
if ch, exists := b.subscribers[partyID]; exists {
|
||||||
|
close(ch)
|
||||||
|
delete(b.subscribers, partyID)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Broadcast sends an event to all subscribers
|
||||||
|
func (b *SessionEventBroadcaster) Broadcast(event *pb.SessionEvent) {
|
||||||
|
b.mu.RLock()
|
||||||
|
defer b.mu.RUnlock()
|
||||||
|
|
||||||
|
for _, ch := range b.subscribers {
|
||||||
|
// Non-blocking send to prevent slow subscribers from blocking
|
||||||
|
select {
|
||||||
|
case ch <- event:
|
||||||
|
default:
|
||||||
|
// Channel full, skip this subscriber
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// BroadcastToParties sends an event to specific parties only
|
||||||
|
func (b *SessionEventBroadcaster) BroadcastToParties(event *pb.SessionEvent, partyIDs []string) {
|
||||||
|
b.mu.RLock()
|
||||||
|
defer b.mu.RUnlock()
|
||||||
|
|
||||||
|
for _, partyID := range partyIDs {
|
||||||
|
if ch, exists := b.subscribers[partyID]; exists {
|
||||||
|
// Non-blocking send
|
||||||
|
select {
|
||||||
|
case ch <- event:
|
||||||
|
default:
|
||||||
|
// Channel full, skip this subscriber
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// SubscriberCount returns the number of active subscribers
|
||||||
|
func (b *SessionEventBroadcaster) SubscriberCount() int {
|
||||||
|
b.mu.RLock()
|
||||||
|
defer b.mu.RUnlock()
|
||||||
|
|
||||||
|
return len(b.subscribers)
|
||||||
|
}
|
||||||
|
|
@ -146,21 +146,59 @@ func main() {
|
||||||
}
|
}
|
||||||
|
|
||||||
func initDatabase(cfg config.DatabaseConfig) (*sql.DB, error) {
|
func initDatabase(cfg config.DatabaseConfig) (*sql.DB, error) {
|
||||||
db, err := sql.Open("postgres", cfg.DSN())
|
const maxRetries = 10
|
||||||
if err != nil {
|
const retryDelay = 2 * time.Second
|
||||||
return nil, err
|
|
||||||
|
var db *sql.DB
|
||||||
|
var err error
|
||||||
|
|
||||||
|
for i := 0; i < maxRetries; i++ {
|
||||||
|
db, err = sql.Open("postgres", cfg.DSN())
|
||||||
|
if err != nil {
|
||||||
|
logger.Warn("Failed to open database connection, retrying...",
|
||||||
|
zap.Int("attempt", i+1),
|
||||||
|
zap.Int("max_retries", maxRetries),
|
||||||
|
zap.Error(err))
|
||||||
|
time.Sleep(retryDelay * time.Duration(i+1))
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
db.SetMaxOpenConns(cfg.MaxOpenConns)
|
||||||
|
db.SetMaxIdleConns(cfg.MaxIdleConns)
|
||||||
|
db.SetConnMaxLifetime(cfg.ConnMaxLife)
|
||||||
|
|
||||||
|
// Test connection with Ping
|
||||||
|
if err = db.Ping(); err != nil {
|
||||||
|
logger.Warn("Failed to ping database, retrying...",
|
||||||
|
zap.Int("attempt", i+1),
|
||||||
|
zap.Int("max_retries", maxRetries),
|
||||||
|
zap.Error(err))
|
||||||
|
db.Close()
|
||||||
|
time.Sleep(retryDelay * time.Duration(i+1))
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify database is actually usable with a simple query
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||||
|
var result int
|
||||||
|
err = db.QueryRowContext(ctx, "SELECT 1").Scan(&result)
|
||||||
|
cancel()
|
||||||
|
if err != nil {
|
||||||
|
logger.Warn("Database ping succeeded but query failed, retrying...",
|
||||||
|
zap.Int("attempt", i+1),
|
||||||
|
zap.Int("max_retries", maxRetries),
|
||||||
|
zap.Error(err))
|
||||||
|
db.Close()
|
||||||
|
time.Sleep(retryDelay * time.Duration(i+1))
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
logger.Info("Connected to PostgreSQL and verified connectivity",
|
||||||
|
zap.Int("attempt", i+1))
|
||||||
|
return db, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
db.SetMaxOpenConns(cfg.MaxOpenConns)
|
return nil, fmt.Errorf("failed to connect to database after %d retries: %w", maxRetries, err)
|
||||||
db.SetMaxIdleConns(cfg.MaxIdleConns)
|
|
||||||
db.SetConnMaxLifetime(cfg.ConnMaxLife)
|
|
||||||
|
|
||||||
if err := db.Ping(); err != nil {
|
|
||||||
return nil, err
|
|
||||||
}
|
|
||||||
|
|
||||||
logger.Info("Connected to PostgreSQL")
|
|
||||||
return db, nil
|
|
||||||
}
|
}
|
||||||
|
|
||||||
func startHTTPServer(
|
func startHTTPServer(
|
||||||
|
|
|
||||||
|
|
@ -21,12 +21,34 @@ type EventPublisherAdapter struct {
|
||||||
|
|
||||||
// NewEventPublisherAdapter creates a new RabbitMQ event publisher
|
// NewEventPublisherAdapter creates a new RabbitMQ event publisher
|
||||||
func NewEventPublisherAdapter(conn *amqp.Connection) (*EventPublisherAdapter, error) {
|
func NewEventPublisherAdapter(conn *amqp.Connection) (*EventPublisherAdapter, error) {
|
||||||
|
// Verify connection is not nil and not closed
|
||||||
|
if conn == nil {
|
||||||
|
return nil, fmt.Errorf("rabbitmq connection is nil")
|
||||||
|
}
|
||||||
|
if conn.IsClosed() {
|
||||||
|
return nil, fmt.Errorf("rabbitmq connection is already closed")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create channel with detailed error logging
|
||||||
channel, err := conn.Channel()
|
channel, err := conn.Channel()
|
||||||
if err != nil {
|
if err != nil {
|
||||||
|
logger.Error("failed to create RabbitMQ channel", zap.Error(err))
|
||||||
return nil, fmt.Errorf("failed to create channel: %w", err)
|
return nil, fmt.Errorf("failed to create channel: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Declare exchange for MPC events
|
// Set channel QoS for better flow control
|
||||||
|
err = channel.Qos(
|
||||||
|
100, // prefetch count
|
||||||
|
0, // prefetch size
|
||||||
|
false, // global
|
||||||
|
)
|
||||||
|
if err != nil {
|
||||||
|
logger.Warn("failed to set channel QoS, continuing anyway", zap.Error(err))
|
||||||
|
// Don't fail on QoS error, it's not critical
|
||||||
|
}
|
||||||
|
|
||||||
|
// Declare exchange for MPC events with detailed logging
|
||||||
|
logger.Info("declaring RabbitMQ exchange for MPC events")
|
||||||
err = channel.ExchangeDeclare(
|
err = channel.ExchangeDeclare(
|
||||||
"mpc.events", // name
|
"mpc.events", // name
|
||||||
"topic", // type
|
"topic", // type
|
||||||
|
|
@ -37,10 +59,14 @@ func NewEventPublisherAdapter(conn *amqp.Connection) (*EventPublisherAdapter, er
|
||||||
nil, // arguments
|
nil, // arguments
|
||||||
)
|
)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
|
logger.Error("failed to declare mpc.events exchange", zap.Error(err))
|
||||||
|
channel.Close()
|
||||||
return nil, fmt.Errorf("failed to declare exchange: %w", err)
|
return nil, fmt.Errorf("failed to declare exchange: %w", err)
|
||||||
}
|
}
|
||||||
|
logger.Info("successfully declared mpc.events exchange")
|
||||||
|
|
||||||
// Declare exchange for party messages
|
// Declare exchange for party messages with detailed logging
|
||||||
|
logger.Info("declaring RabbitMQ exchange for party messages")
|
||||||
err = channel.ExchangeDeclare(
|
err = channel.ExchangeDeclare(
|
||||||
"mpc.messages", // name
|
"mpc.messages", // name
|
||||||
"direct", // type
|
"direct", // type
|
||||||
|
|
@ -51,9 +77,23 @@ func NewEventPublisherAdapter(conn *amqp.Connection) (*EventPublisherAdapter, er
|
||||||
nil, // arguments
|
nil, // arguments
|
||||||
)
|
)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
|
logger.Error("failed to declare mpc.messages exchange", zap.Error(err))
|
||||||
|
channel.Close()
|
||||||
return nil, fmt.Errorf("failed to declare messages exchange: %w", err)
|
return nil, fmt.Errorf("failed to declare messages exchange: %w", err)
|
||||||
}
|
}
|
||||||
|
logger.Info("successfully declared mpc.messages exchange")
|
||||||
|
|
||||||
|
// Setup channel close notification
|
||||||
|
closeChan := make(chan *amqp.Error, 1)
|
||||||
|
channel.NotifyClose(closeChan)
|
||||||
|
go func() {
|
||||||
|
err := <-closeChan
|
||||||
|
if err != nil {
|
||||||
|
logger.Error("RabbitMQ channel closed unexpectedly", zap.Error(err))
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
logger.Info("EventPublisherAdapter initialized successfully")
|
||||||
return &EventPublisherAdapter{
|
return &EventPublisherAdapter{
|
||||||
conn: conn,
|
conn: conn,
|
||||||
channel: channel,
|
channel: channel,
|
||||||
|
|
|
||||||
|
|
@ -14,9 +14,10 @@ const (
|
||||||
PartyRoleTemporary PartyRole = "temporary"
|
PartyRoleTemporary PartyRole = "temporary"
|
||||||
)
|
)
|
||||||
|
|
||||||
// PartyEndpoint represents a party endpoint from the pool
|
// PartyEndpoint represents a party from the pool
|
||||||
|
// Note: Address is removed - parties connect to Message Router themselves
|
||||||
|
// Session Coordinator only needs PartyID for message routing
|
||||||
type PartyEndpoint struct {
|
type PartyEndpoint struct {
|
||||||
Address string
|
|
||||||
PartyID string
|
PartyID string
|
||||||
Ready bool
|
Ready bool
|
||||||
Role PartyRole // Role of the party (persistent, delegate, temporary)
|
Role PartyRole // Role of the party (persistent, delegate, temporary)
|
||||||
|
|
|
||||||
|
|
@ -255,23 +255,101 @@ func initRabbitMQ(cfg config.RabbitMQConfig) (*amqp.Connection, error) {
|
||||||
var err error
|
var err error
|
||||||
|
|
||||||
for i := 0; i < maxRetries; i++ {
|
for i := 0; i < maxRetries; i++ {
|
||||||
|
// Attempt to dial RabbitMQ
|
||||||
conn, err = amqp.Dial(cfg.URL())
|
conn, err = amqp.Dial(cfg.URL())
|
||||||
if err != nil {
|
if err != nil {
|
||||||
logger.Warn("Failed to connect to RabbitMQ, retrying...",
|
logger.Warn("Failed to dial RabbitMQ, retrying...",
|
||||||
zap.Int("attempt", i+1),
|
zap.Int("attempt", i+1),
|
||||||
zap.Int("max_retries", maxRetries),
|
zap.Int("max_retries", maxRetries),
|
||||||
|
zap.String("url", maskPassword(cfg.URL())),
|
||||||
zap.Error(err))
|
zap.Error(err))
|
||||||
time.Sleep(retryDelay * time.Duration(i+1))
|
time.Sleep(retryDelay * time.Duration(i+1))
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
|
|
||||||
logger.Info("Connected to RabbitMQ")
|
// Verify connection is actually usable by opening a channel
|
||||||
|
ch, err := conn.Channel()
|
||||||
|
if err != nil {
|
||||||
|
logger.Warn("RabbitMQ connection established but channel creation failed, retrying...",
|
||||||
|
zap.Int("attempt", i+1),
|
||||||
|
zap.Int("max_retries", maxRetries),
|
||||||
|
zap.Error(err))
|
||||||
|
conn.Close()
|
||||||
|
time.Sleep(retryDelay * time.Duration(i+1))
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test the channel with a simple operation (declare a test exchange)
|
||||||
|
err = ch.ExchangeDeclare(
|
||||||
|
"mpc.health.check", // name
|
||||||
|
"fanout", // type
|
||||||
|
false, // durable
|
||||||
|
true, // auto-deleted
|
||||||
|
false, // internal
|
||||||
|
false, // no-wait
|
||||||
|
nil, // arguments
|
||||||
|
)
|
||||||
|
if err != nil {
|
||||||
|
logger.Warn("RabbitMQ channel created but exchange declaration failed, retrying...",
|
||||||
|
zap.Int("attempt", i+1),
|
||||||
|
zap.Int("max_retries", maxRetries),
|
||||||
|
zap.Error(err))
|
||||||
|
ch.Close()
|
||||||
|
conn.Close()
|
||||||
|
time.Sleep(retryDelay * time.Duration(i+1))
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Clean up test exchange
|
||||||
|
ch.ExchangeDelete("mpc.health.check", false, false)
|
||||||
|
ch.Close()
|
||||||
|
|
||||||
|
// Setup connection close notification
|
||||||
|
closeChan := make(chan *amqp.Error, 1)
|
||||||
|
conn.NotifyClose(closeChan)
|
||||||
|
go func() {
|
||||||
|
err := <-closeChan
|
||||||
|
if err != nil {
|
||||||
|
logger.Error("RabbitMQ connection closed unexpectedly", zap.Error(err))
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
logger.Info("Connected to RabbitMQ and verified connectivity",
|
||||||
|
zap.Int("attempt", i+1))
|
||||||
return conn, nil
|
return conn, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
return nil, fmt.Errorf("failed to connect to RabbitMQ after %d retries: %w", maxRetries, err)
|
return nil, fmt.Errorf("failed to connect to RabbitMQ after %d retries: %w", maxRetries, err)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// maskPassword masks the password in the RabbitMQ URL for logging
|
||||||
|
func maskPassword(url string) string {
|
||||||
|
// Simple masking: amqp://user:password@host:port -> amqp://user:****@host:port
|
||||||
|
start := 0
|
||||||
|
for i := 0; i < len(url); i++ {
|
||||||
|
if url[i] == ':' && i > 0 && url[i-1] != '/' {
|
||||||
|
start = i + 1
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if start == 0 {
|
||||||
|
return url
|
||||||
|
}
|
||||||
|
|
||||||
|
end := start
|
||||||
|
for i := start; i < len(url); i++ {
|
||||||
|
if url[i] == '@' {
|
||||||
|
end = i
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if end == start {
|
||||||
|
return url
|
||||||
|
}
|
||||||
|
|
||||||
|
return url[:start] + "****" + url[end:]
|
||||||
|
}
|
||||||
|
|
||||||
func startGRPCServer(
|
func startGRPCServer(
|
||||||
cfg *config.Config,
|
cfg *config.Config,
|
||||||
createSessionUC *use_cases.CreateSessionUseCase,
|
createSessionUC *use_cases.CreateSessionUseCase,
|
||||||
|
|
|
||||||
|
|
@ -15,9 +15,9 @@ import (
|
||||||
"k8s.io/client-go/tools/clientcmd"
|
"k8s.io/client-go/tools/clientcmd"
|
||||||
)
|
)
|
||||||
|
|
||||||
// PartyEndpoint represents a discovered party endpoint
|
// PartyEndpoint represents a discovered party
|
||||||
|
// Note: Address removed - parties connect to Message Router themselves
|
||||||
type PartyEndpoint struct {
|
type PartyEndpoint struct {
|
||||||
Address string
|
|
||||||
PodName string
|
PodName string
|
||||||
Ready bool
|
Ready bool
|
||||||
Role output.PartyRole // Party role extracted from pod labels
|
Role output.PartyRole // Party role extracted from pod labels
|
||||||
|
|
@ -113,7 +113,6 @@ func (pd *PartyDiscovery) GetAvailableParties() []output.PartyEndpoint {
|
||||||
for _, ep := range pd.endpoints {
|
for _, ep := range pd.endpoints {
|
||||||
if ep.Ready {
|
if ep.Ready {
|
||||||
available = append(available, output.PartyEndpoint{
|
available = append(available, output.PartyEndpoint{
|
||||||
Address: ep.Address,
|
|
||||||
PartyID: ep.PodName, // Use pod name as party ID
|
PartyID: ep.PodName, // Use pod name as party ID
|
||||||
Ready: ep.Ready,
|
Ready: ep.Ready,
|
||||||
Role: ep.Role,
|
Role: ep.Role,
|
||||||
|
|
@ -134,7 +133,6 @@ func (pd *PartyDiscovery) GetAvailablePartiesByRole(role output.PartyRole) []out
|
||||||
for _, ep := range pd.endpoints {
|
for _, ep := range pd.endpoints {
|
||||||
if ep.Ready && ep.Role == role {
|
if ep.Ready && ep.Role == role {
|
||||||
available = append(available, output.PartyEndpoint{
|
available = append(available, output.PartyEndpoint{
|
||||||
Address: ep.Address,
|
|
||||||
PartyID: ep.PodName,
|
PartyID: ep.PodName,
|
||||||
Ready: ep.Ready,
|
Ready: ep.Ready,
|
||||||
Role: ep.Role,
|
Role: ep.Role,
|
||||||
|
|
@ -216,21 +214,12 @@ func (pd *PartyDiscovery) refresh() error {
|
||||||
role = output.PartyRole(roleLabel)
|
role = output.PartyRole(roleLabel)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Get pod IP
|
// Add party to pool (no address needed - parties connect to Message Router)
|
||||||
if pod.Status.PodIP != "" {
|
endpoints = append(endpoints, PartyEndpoint{
|
||||||
// Assuming gRPC port is 50051 (should be configurable)
|
PodName: pod.Name,
|
||||||
grpcPort := os.Getenv("MPC_PARTY_GRPC_PORT")
|
Ready: ready,
|
||||||
if grpcPort == "" {
|
Role: role,
|
||||||
grpcPort = "50051"
|
})
|
||||||
}
|
|
||||||
|
|
||||||
endpoints = append(endpoints, PartyEndpoint{
|
|
||||||
Address: fmt.Sprintf("%s:%s", pod.Status.PodIP, grpcPort),
|
|
||||||
PodName: pod.Name,
|
|
||||||
Ready: ready,
|
|
||||||
Role: role,
|
|
||||||
})
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
pd.mu.Lock()
|
pd.mu.Lock()
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue