it0/packages/services/agent-service
hailin 0d5441f720 fix(wecom): token mutex, leader lease, backoff, watchdog
Four additional robustness fixes:

1. **Token refresh mutex** — tokenRefreshPromise deduplicates concurrent
   refresh calls. All callers share one in-flight HTTP request instead
   of each firing their own, eliminating the race condition.

2. **Distributed leader lease** — service_state table used for a
   TTL-based leader election (LEADER_LEASE_TTL_S=90s). Only one
   agent-service instance polls at a time; others skip until the lease
   expires. Lease auto-released on graceful shutdown.

3. **Exponential backoff** — consecutive poll errors increment a counter;
   next delay = min(10s × 2^(n-1), 5min). Prevents log spam and
   reduces load during sustained WeCom API outages. Counter resets on
   any successful poll.

4. **Watchdog timer** — setInterval every 2min checks lastPollAt.
   If poll loop has been silent for >5min, clears the timer and
   reschedules immediately, recovering from any silent crash.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 05:54:42 -07:00
..
prisma chore(agent): add empty prisma dir to fix Docker build COPY step 2026-03-08 03:10:19 -07:00
src fix(wecom): token mutex, leader lease, backoff, watchdog 2026-03-10 05:54:42 -07:00
package.json fix(feishu): commit missing entity field + SDK dependency 2026-03-09 03:20:03 -07:00
tsconfig.json Initial commit: IT0 AI-powered server cluster operations platform 2026-02-08 22:54:37 -08:00