|
| 1 | +# Persistence Audit – State Ledger |
| 2 | + |
| 3 | +This document inventories all state that influences user-visible behavior and |
| 4 | +must survive a `docker compose down && docker compose up -d` cycle. |
| 5 | + |
| 6 | +## State Items |
| 7 | + |
| 8 | +| # | State Item | Owner (package) | Must Survive Restart? | Current Storage | Proposed Durable Representation | Idempotency / Dedup Strategy | |
| 9 | +|---|---|---|---|---|---|---| |
| 10 | +| 1 | **In-memory timers** (`SimpleTimer.timers` map) | `flow/timer.go` | Yes | In-memory map of `*timerEntry` keyed by timer ID | `jobs` table (`kind` + `run_at` + `payload_json`) | `dedupe_key` unique constraint; on restart, requeue stale running jobs | |
| 11 | +| 2 | **Recurring schedule timers** (daily prompt schedule) | `flow/timer.go` | Yes | In-memory via `time.AfterFunc` with self-rescheduling | `jobs` table with `kind=recurring_prompt`; after execution, enqueue next occurrence | `dedupe_key` per participant + schedule; only one active job per schedule | |
| 12 | +| 3 | **Daily prompt pending state** (`dailyPromptPendingState`) | `flow/scheduler_tool.go` | Yes | Persisted in `flow_states.state_data` as JSON (key `daily_prompt_pending`) | Keep in `flow_states`; reminder becomes a `jobs` row instead of `time.AfterFunc` | Check pending state before sending reminder; clear after send | |
| 13 | +| 4 | **Pending state transition timer** (delayed `transition_state`) | `flow/state_transition_tool.go` | Yes | Timer ID stored in `flow_states.state_data`, actual timer in memory | `jobs` table with `kind=state_transition` | `dedupe_key` per participant + flow type; cancel = mark job canceled | |
| 14 | +| 5 | **Feedback follow-up timer** | `flow/feedback_module.go` | Yes | In-memory timer via `SimpleTimer` | `jobs` table with `kind=feedback_followup` | `dedupe_key` per participant; skip if participant state changed | |
| 15 | +| 6 | **Feedback initial timeout timer** | `flow/feedback_module.go` | Yes | In-memory timer via `SimpleTimer` | `jobs` table with `kind=feedback_timeout` | `dedupe_key` per participant; skip if already responded | |
| 16 | +| 7 | **LID-by-phone cache** (`lidByPhone` map) | `messaging/whatsapp_service.go` | No | In-memory map | Remains in-memory; repopulated on first message from each contact | N/A – cache miss just means first send uses phone number directly | |
| 17 | +| 8 | **Outgoing WhatsApp messages** (sent via `SendMessage`) | `messaging/whatsapp_service.go` | Yes | Fire-and-forget; receipt stored in `receipts` table | `outbox_messages` table; flow enqueues, sender worker delivers | `dedupe_key` prevents duplicate sends on restart | |
| 18 | +| 9 | **Inbound WhatsApp messages** (received via event handler) | `messaging/whatsapp_service.go` | Yes (dedup) | Processed immediately; response stored in `responses` table | `inbound_dedup` table keyed by WhatsApp message ID | Insert-or-skip on message ID; prevents reprocessing after restart | |
| 19 | +| 10 | **Flow state** (`flow_states` table) | `store/`, `flow/state_manager.go` | Yes | SQLite/Postgres `flow_states` table | Already persisted | N/A – already durable | |
| 20 | +| 11 | **Conversation participants** | `store/` | Yes | SQLite/Postgres `conversation_participants` table | Already persisted | N/A – already durable | |
| 21 | +| 12 | **Registered hooks** | `store/` | Yes | SQLite/Postgres `registered_hooks` table | Already persisted | N/A – already durable | |
| 22 | +| 13 | **Receipts / Responses** | `store/` | Yes | SQLite/Postgres tables | Already persisted | N/A – already durable | |
| 23 | + |
| 24 | +## New Tables |
| 25 | + |
| 26 | +### `jobs` |
| 27 | + |
| 28 | +Replaces all in-memory `time.AfterFunc` timers with durable, restart-safe job records. |
| 29 | + |
| 30 | +| Column | Type | Description | |
| 31 | +|---|---|---| |
| 32 | +| `id` | TEXT (UUID) | Primary key | |
| 33 | +| `kind` | TEXT | Job type (e.g., `recurring_prompt`, `state_transition`, `feedback_followup`, `daily_prompt_reminder`) | |
| 34 | +| `run_at` | TIMESTAMP | When the job should execute | |
| 35 | +| `payload_json` | TEXT/JSON | Job-specific parameters | |
| 36 | +| `status` | TEXT | `queued`, `running`, `done`, `failed`, `canceled` | |
| 37 | +| `attempt` | INTEGER | Current attempt number | |
| 38 | +| `max_attempts` | INTEGER | Maximum retry attempts | |
| 39 | +| `last_error` | TEXT | Last error message (nullable) | |
| 40 | +| `locked_at` | TIMESTAMP | When a worker claimed this job (nullable) | |
| 41 | +| `dedupe_key` | TEXT | Unique constraint for preventing duplicates (nullable) | |
| 42 | +| `created_at` | TIMESTAMP | Row creation time | |
| 43 | +| `updated_at` | TIMESTAMP | Last update time | |
| 44 | + |
| 45 | +### `outbox_messages` |
| 46 | + |
| 47 | +Ensures outgoing WhatsApp sends are restart-safe and idempotent. |
| 48 | + |
| 49 | +| Column | Type | Description | |
| 50 | +|---|---|---| |
| 51 | +| `id` | TEXT (UUID) | Primary key | |
| 52 | +| `participant_id` | TEXT | Target participant | |
| 53 | +| `kind` | TEXT | Message type (e.g., `prompt`, `reminder`, `feedback_followup`) | |
| 54 | +| `payload_json` | TEXT/JSON | Message content and metadata | |
| 55 | +| `status` | TEXT | `queued`, `sending`, `sent`, `failed` | |
| 56 | +| `attempts` | INTEGER | Send attempt count | |
| 57 | +| `next_attempt_at` | TIMESTAMP | When to retry (nullable) | |
| 58 | +| `dedupe_key` | TEXT | Unique constraint for preventing duplicate sends | |
| 59 | +| `locked_at` | TIMESTAMP | When a worker claimed this message (nullable) | |
| 60 | +| `last_error` | TEXT | Last error message (nullable) | |
| 61 | +| `created_at` | TIMESTAMP | Row creation time | |
| 62 | +| `updated_at` | TIMESTAMP | Last update time | |
| 63 | + |
| 64 | +### `inbound_dedup` |
| 65 | + |
| 66 | +Prevents reprocessing of inbound messages after restart. |
| 67 | + |
| 68 | +| Column | Type | Description | |
| 69 | +|---|---|---| |
| 70 | +| `message_id` | TEXT | Primary key – stable WhatsApp message ID | |
| 71 | +| `participant_id` | TEXT | Sender identifier | |
| 72 | +| `received_at` | TIMESTAMP | When the message was first seen | |
| 73 | +| `processed_at` | TIMESTAMP | When processing completed (nullable) | |
| 74 | + |
| 75 | +## Invariants |
| 76 | + |
| 77 | +1. Any state transition that implies future work must, in ONE DB transaction: |
| 78 | + - Update `flow_states` |
| 79 | + - Enqueue `jobs` and/or `outbox_messages` |
| 80 | +2. Timers are never used for durable behavior – they become jobs. |
| 81 | +3. `dedupe_key` constraints ensure restarts and retries do not duplicate work. |
| 82 | +4. On startup: requeue stale `running` jobs/outbox (locked_at older than threshold). |
0 commit comments