Skip to content

[mirror] fix(backend/notifications): atomic upsert + drop eager include#3

Open
yashwant86 wants to merge 5 commits intomm-base-12919from
mm-pr-12919
Open

[mirror] fix(backend/notifications): atomic upsert + drop eager include#3
yashwant86 wants to merge 5 commits intomm-base-12919from
mm-pr-12919

Conversation

@yashwant86
Copy link
Copy Markdown

@yashwant86 yashwant86 commented Apr 26, 2026

Mirror of upstream Significant-Gravitas#12919 for benchmark. Do not merge.


Summary by MergeMonkey

  • Corrections:
    • Fixes concurrent upsert race condition where two simultaneous calls on missing batch could both attempt INSERT, losing one event. Retries on UniqueViolationError to converge on UPDATE path.
    • Eliminates eager loading of notification rows in upsert operation, preventing statement_timeout on heavy users with thousands of accumulated notifications.
  • Chores:
    • Adds comprehensive async integration tests for upsert atomicity, concurrent invocations, and retry behavior.

majdyz added 5 commits April 25, 2026 10:34
… include

Replace the find→create-or-update pattern in
create_or_add_to_user_notification_batch with Prisma's atomic upsert on
(userId, type). This eliminates two real problems:

1. Slow query / statement_timeout: the prior find_unique used
   include={'Notifications': True}, which loaded every event in the
   batch just to decide whether to create or update. Heavy AGENT_RUN
   users accumulate thousands of events, triggering Postgres
   statement_timeout (seen in Sentry, database-manager pod).

2. TOCTOU race on @@unique([userId, type]): two concurrent invocations
   could both observe 'no batch' and race into create, or split around
   an existing batch and lose writes.

upsert handles both atomically, and we no longer need to read existing
notifications at all — the sole caller (NotificationManager._should_batch)
ignores the returned DTO and separately calls
get_user_notification_oldest_message_in_batch when it needs data.
Three DB-backed tests in backend/data/notifications_test.py:

- test_upsert_creates_empty_batch_then_appends — empty batch goes
  through the create branch, a second call goes through the update
  branch, resulting in exactly one batch row and two event rows.

- test_upsert_does_not_load_existing_notifications — seeds 25 events,
  upserts a 26th, and asserts the returned DTO carries an empty
  notifications list. This is the regression guard for the
  statement_timeout we observed: the eager include is gone, so callers
  no longer pull thousands of rows back on every append.

- test_upsert_concurrent_invocations_no_unique_violation — two
  gather()-ed upserts on an empty (user_id, type) pair both return
  successfully, leaving one batch and two events. The old find→create
  pattern would have raced on @@unique([userId, type]).
Prisma's `upsert` is find→INSERT/UPDATE under the hood, not a true SQL
ON CONFLICT, so two concurrent invocations on a missing row can both
take the INSERT branch and one loses to @@unique([userId, type]). The
previous PR commit relied on Prisma upsert being atomic; CI proved
otherwise via test_upsert_concurrent_invocations_no_unique_violation.

Catch the UniqueViolationError and retry once: the row now exists, so
the second pass takes the UPDATE branch deterministically.

Adds a deterministic unit test that monkeypatches the actions class to
inject the unique violation, since asyncio.gather + a shared Prisma
connection pool can serialize calls in some test setups (CodeRabbit's
concurrency-confidence concern).
@bot-mergemonkey
Copy link
Copy Markdown

bot-mergemonkey Bot commented Apr 26, 2026

Risk AssessmentNEEDS-TESTING · ~20 min review

Focus areas: Upsert retry logic on UniqueViolationError · Removal of eager Notifications include · Concurrent invocation safety · Unreachable RuntimeError at line 512

Assessment: Atomic upsert refactor with retry logic and eager-load removal requires integration testing.

Walkthrough

User calls create_or_add_to_user_notification_batch with a notification event. The function attempts an atomic upsert (create new batch or append to existing) without pre-fetching the batch. If two concurrent calls race on a missing row, both may attempt INSERT; the loser catches UniqueViolationError and retries, now hitting the UPDATE path. The returned DTO contains no eagerly-loaded notifications, avoiding statement_timeout on heavy users.

Changes

Files Summary
Atomic Upsert with Retry Logic
autogpt_platform/backend/backend/data/notifications.py
Replaces find-then-create-or-update pattern with atomic upsert + single retry on UniqueViolationError. Removes eager `Notifications` include to prevent statement_timeout on large batches. Handles Prisma's non-SQL-atomic upsert (find→INSERT/UPDATE) by retrying losers of concurrent INSERT races.
Upsert Integration Tests
autogpt_platform/backend/backend/data/notifications_test.py
Adds four async integration tests covering empty-batch creation, append without eager load, concurrent invocations without unique violations, and deterministic retry path via monkeypatch. Includes test helpers for user/batch setup and cleanup.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant NotifFunc as create_or_add_to_user_notification_batch
    participant Prisma
    participant DB as Database
    Caller->>NotifFunc: call with user_id, type, event
    NotifFunc->>NotifFunc: serialize event data
    loop attempt 0 or 1
        NotifFunc->>Prisma: upsert (no include)
        Prisma->>DB: find unique by (userId, type)
        alt row exists
            DB-->>Prisma: found
            Prisma->>DB: UPDATE + create notification
            DB-->>Prisma: updated batch
        else row missing
            DB-->>Prisma: not found
            Prisma->>DB: INSERT new batch + notification
            alt INSERT succeeds
                DB-->>Prisma: created batch
            else INSERT fails (concurrent winner)
                DB-->>Prisma: UniqueViolationError
                Prisma-->>NotifFunc: UniqueViolationError
                Note over NotifFunc: if attempt==0: continue loop
                Note over NotifFunc: if attempt==1: raise
            end
        end
        alt no exception
            Prisma-->>NotifFunc: batch (notifications=[])
            NotifFunc-->>Caller: UserNotificationBatchDTO
        end
    end
Loading

Dig Deeper With Commands

  • /review <file-path> <function-optional>
  • /chat <file-path> "<question>"
  • /roast <file-path>

Runs only when explicitly triggered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants