# Reconnection Event Stream Bug Analysis ## Problem Summary After network disconnection and reconnection, the event stream subscription is NOT restored, causing: - No `session_started` event received - Keygen never starts - Messages pile up forever (539 pending) ## Root Cause ### The Bug Chain ``` 1. Network disconnects ↓ 2. subscribeSessionEvents Flow closes ↓ 3. awaitClose block executes (GrpcClient.kt:925) eventStreamSubscribed.set(false) ← FLAG CLEARED ↓ 4. Network reconnects successfully ↓ 5. reSubscribeStreams() called (GrpcClient.kt:202) ↓ 6. Line 506 checks: val needsResubscribe = eventStreamSubscribed.get() || activeMessageSubscription != null ↓ 7. eventStreamSubscribed.get() returns FALSE ❌ activeMessageSubscription is also NULL ❌ ↓ 8. needsResubscribe = false ↓ 9. Callback NEVER invoked ↓ 10. Event stream NEVER restored ``` ## Code Evidence ### GrpcClient.kt - Where Flag is Set/Cleared **Line 844** - Flag set when subscribing: ```kotlin fun subscribeSessionEvents(partyId: String): Flow = callbackFlow { eventStreamSubscribed.set(true) ← Set to TRUE eventStreamPartyId = partyId ... } ``` **Line 925** - Flag cleared when Flow closes (THE BUG): ```kotlin awaitClose { Log.d(TAG, "subscribeSessionEvents: Flow closed for partyId=$partyId") eventStreamSubscribed.set(false) ← Set to FALSE on disconnect eventStreamPartyId = null } ``` **Line 506** - Reconnection check (FAILS because flag is false): ```kotlin private fun reSubscribeStreams() { val needsResubscribe = eventStreamSubscribed.get() || activeMessageSubscription != null // ↑ Returns FALSE after disconnect if (needsResubscribe) { ← This condition is FALSE Log.d(TAG, "Triggering stream re-subscription callback") ... onReconnectedCallback?.invoke() ← NEVER REACHED } } ``` ## Log Evidence ### Normal Reconnection (16:28:26) - WORKS ✅ ``` 16:28:26.082 D/GrpcClient: Connected successfully 16:28:26.086 D/GrpcClient: Re-registering party: 7c72c28f... 16:28:26.130 D/GrpcClient: Party registered: 7c72c28f... 16:28:26.130 D/GrpcClient: Triggering stream re-subscription callback ← Present! 16:28:26.130 D/GrpcClient: - Event stream: true, partyId: 7c72c28f... 16:28:26.130 D/TssRepository: gRPC reconnected, restoring streams... ← Present! 16:28:26.130 D/TssRepository: Restoring session event subscription ← Present! ``` ### Problem Reconnection (16:29:47) - FAILS ❌ ``` 16:29:47.090 D/GrpcClient: Connected successfully 16:29:47.093 D/GrpcClient: Re-registering party: 7c72c28f... 16:29:47.146 D/GrpcClient: Party registered: 7c72c28f... [MISSING]: "Triggering stream re-subscription callback" ← NOT PRESENT! [MISSING]: "gRPC reconnected, restoring streams..." ← NOT PRESENT! [MISSING]: "Restoring session event subscription" ← NOT PRESENT! Result: 16:30:47.198 W/GrpcClient: Has 539 pending messages - may have missed events 16:31:17.237 W/GrpcClient: Has 539 pending messages - may have missed events ``` ## Why First Reconnection Worked Looking at the timeline: ``` 16:27:53 - App started, event subscription started 16:28:26 - First reconnect (1 minute later) Event subscription was STILL ACTIVE eventStreamSubscribed = true ✅ 16:29:15 - Network disconnect (49 seconds later) Flow closed → eventStreamSubscribed set to FALSE ❌ 16:29:47 - Second reconnect eventStreamSubscribed = false ❌ Callback NOT invoked ❌ ``` **Key Insight**: The first reconnection worked because the event stream Flow hadn't closed yet. The second reconnection failed because the Flow had closed and cleared the flag. ## The Design Flaw The current design has a **state tracking inconsistency**: ```kotlin // When to subscribe? eventStreamSubscribed = true // "I am currently subscribed" // When to unsubscribe? eventStreamSubscribed = false // "I am no longer subscribed" // When to re-subscribe? if (eventStreamSubscribed) { ... } // ❌ WRONG - flag is already false! ``` **Problem**: The flag tracks "am I currently subscribed?" but reconnection logic needs to know "should I re-subscribe?". These are two different concepts. ## Solution Options ### Option 1: Don't Clear Flag in awaitClose (Simple) ```kotlin awaitClose { Log.d(TAG, "subscribeSessionEvents: Flow closed for partyId=$partyId") // DON'T clear the flag - keep it for reconnection // eventStreamSubscribed.set(false) ← REMOVE THIS // eventStreamPartyId = null ← REMOVE THIS } ``` **Pros**: Minimal change, preserves intent to re-subscribe **Cons**: Flag no longer accurately reflects current state ### Option 2: Add Separate "Should Restore" Flag (Better) ```kotlin // Two separate flags: private val eventStreamSubscribed = AtomicBoolean(false) // Current state private val shouldRestoreEventStream = AtomicBoolean(false) // Intent to restore // When subscribing: eventStreamSubscribed.set(true) shouldRestoreEventStream.set(true) // Remember to restore // In awaitClose: eventStreamSubscribed.set(false) // No longer subscribed // Keep shouldRestoreEventStream = true // But should restore on reconnect // In reSubscribeStreams: val needsResubscribe = shouldRestoreEventStream.get() || activeMessageSubscription != null ``` **Pros**: Clear separation of concerns, accurate state tracking **Cons**: More code, requires careful handling of clear conditions ### Option 3: Store Last Subscription State (Most Robust) ```kotlin // Store full subscription state for recovery private data class StreamState( val eventStreamPartyId: String?, val messageSessionId: String?, val messagePartyId: String? ) private val lastStreamState = AtomicReference(null) // On subscribe, save state // On reconnect, restore from saved state ``` **Pros**: Can restore exact previous state, handles complex scenarios **Cons**: Most complex implementation ## Recommended Fix **Use Option 1 (simplest) with Option 2 concept (clearer intent)**: 1. Don't clear `eventStreamSubscribed` in `awaitClose` 2. Only clear it when user explicitly unsubscribes or app shuts down 3. This preserves the "I was subscribed, so re-subscribe on reconnect" behavior **Alternative**: Add explicit unsubscribe call only when intentionally stopping (not on disconnect). ## Files to Modify ### GrpcClient.kt **Line 923-927** - Remove flag clearing in awaitClose: ```kotlin awaitClose { Log.d(TAG, "subscribeSessionEvents: Flow closed for partyId=$partyId") // Keep flags for reconnection - don't clear here // Only clear on explicit unsubscribe or disconnect } ``` **Line 933-936** - Keep explicit unsubscribe as-is: ```kotlin fun unsubscribeSessionEvents() { eventStreamSubscribed.set(false) eventStreamPartyId = null } ``` ## Testing Checklist After fix: - [ ] Start app, subscribe to events - [ ] Simulate network disconnect (airplane mode) - [ ] Verify log shows: "Triggering stream re-subscription callback" - [ ] Verify log shows: "gRPC reconnected, restoring streams..." - [ ] Verify log shows: "Restoring session event subscription" - [ ] Verify pending messages start decreasing - [ ] Test 2-of-3 keygen succeeds after reconnection ## Why This Wasn't Caught Before 1. **Timing-dependent**: Only fails if Flow closes before reconnect 2. **Works in most cases**: Quick reconnects (< 1 minute) often succeed before Flow timeout 3. **No explicit test**: Didn't test scenario of "disconnect → wait for Flow to close → reconnect" 4. **Silent failure**: No error logged, just missing callback invocation ## Conclusion The safeLaunch optimization did NOT cause this bug. The bug exists because: 1. `awaitClose` clears `eventStreamSubscribed` on disconnect 2. Reconnection logic relies on this flag to decide if callback should be invoked 3. After disconnect, flag is false, so callback is never invoked **Fix**: Don't clear the subscription intent flag on temporary disconnection.