问题背景:
- 1.0 生产环境发现 6 个孤儿 replication slot, WAL 积压 8.6GB (已清理)
- 1.0 使用 Debezium 2.4, 存在 DBZ-7316 (WAL 无限积压) bug
- 1.0 和 2.0 均无 max_slot_wal_keep_size 安全阀 (已在线设置 10GB)
- 2.0 outbox connector 使用 pg_logical_emit_message 心跳, 不经 publication
- 2.0 outbox connector RegexRouter regex=".*" 导致 heartbeat 污染消费者
修复内容:
[docker-compose.yml - 1.0 基础设施]
- Debezium: 2.4 → 2.5.4.Final (修复 DBZ-7316)
- PostgreSQL: 添加 max_slot_wal_keep_size=10GB
- Debezium REST API: 端口绑定 127.0.0.1 (防 SSRF 注入)
- PostgreSQL: 端口绑定 127.0.0.1 (防公网直连)
- Kafka Connect: 添加 OFFSET_FLUSH_INTERVAL_MS=10s
[docker-compose.2.0.yml - 2.0 基础设施]
- Debezium: 2.5 → 2.5.4.Final (锁定精确版本)
- PostgreSQL: 添加 max_slot_wal_keep_size=10GB
- Kafka Connect: 添加 OFFSET_FLUSH_INTERVAL_MS=10s
[1.0 Connector 配置 - identity/authorization]
- 添加 heartbeat.action.query (INSERT INTO debezium_heartbeat TABLE 方式)
- 之前只有 heartbeat.interval.ms 无 action.query, 心跳不生效
[2.0 Outbox Connector 配置 - 5个全部更新]
- heartbeat: pg_logical_emit_message → INSERT INTO debezium_heartbeat TABLE 方式
(TABLE 方式经过 publication → Debezium 消费 → 推进 confirmed_flush_lsn)
- RegexRouter: regex ".*" → ".*outbox_events" (只路由 outbox 事件, heartbeat 走默认 topic)
- table.include.list: 添加 debezium_heartbeat (确保心跳变更生成 Kafka 消息)
- publication.autocreate.mode: filtered → disabled (使用预创建的 publication)
- auth/contribution: 添加 signal channel 配置 (支持增量快照数据重放)
经验总结:
1. pg_logical_emit_message 写 WAL 但不经 publication, 无法推进 confirmed_flush_lsn
2. RegexRouter regex=".*" 把所有变更(含 heartbeat)路由到 outbox topic, 污染消费者
3. 删除 Kafka Connect connector 不会自动清理 PostgreSQL replication slot
4. max_slot_wal_keep_size 是 sighup 级参数, 可在线 ALTER SYSTEM + pg_reload_conf
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
||
|---|---|---|
| .claude | ||
| backend | ||
| contracts | ||
| docs | ||
| frontend | ||
| kubernetes | ||
| scripts | ||
| tests | ||
| .gitignore | ||
| README.md | ||
| SEED01-qrcode.png | ||
| STKAITI.TTF | ||
| contract.docx | ||
| docker-compose.yml | ||
| 挖矿.xlsx | ||
| 榴莲皇后数据.xlsx | ||
| 联合种植协议董事长_release_form.pdf | ||