Skip to content

fix(core): Prevent Redis connection recovery from being missed#28256

Open
ivov wants to merge 2 commits intomasterfrom
cat-2721-fix-redis-reconnection-flag
Open

fix(core): Prevent Redis connection recovery from being missed#28256
ivov wants to merge 2 commits intomasterfrom
cat-2721-fix-redis-reconnection-flag

Conversation

@ivov
Copy link
Copy Markdown
Member

@ivov ivov commented Apr 9, 2026

Summary

RedisClientService debounces all event emissions by 1s, and the reconnection retry interval is also 1s. When a connection drops, retryStrategy queues a debounced connection-lost event. Meanwhile, ioRedis waits 1s and reconnects. So the debounce timer and the ready event race each other. Typically the debounce wins and sets lostConnection = true before ready checks it, but under load ready can fire first, so it finds lostConnection still as false and skips recovery, then the late connection-lost flips the flag to true, where it stays forever.

This PR sets lostConnection = true synchronously in retryStrategy to bypass the debounce. The debounced listener now only logs.

Related Linear tickets, Github issues, and Community forum posts

https://linear.app/n8n/issue/CAT-2721
Closes #27753

Review / Merge checklist

  • PR title and summary are descriptive. (conventions)
  • Docs updated or follow-up ticket created.
  • Tests included.
  • PR Labeled with Backport to Beta, Backport to Stable, or Backport to v1 (if the PR is an urgent fix that needs to be backported)

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Architecture diagram
sequenceDiagram
    participant Redis as ioRedis Library
    participant RCS as RedisClientService
    participant Emitter as Internal Emitter (Debounced)

    Note over Redis, RCS: Connection Failure
    Redis->>RCS: retryStrategy() callback invoked

    rect rgb(240, 240, 240)
        Note right of RCS: NEW: Atomic State Update
        RCS->>RCS: Set lostConnection = true
    end

    RCS->>Emitter: emit('connection-lost', timeout)
    
    Note over Emitter: Debounce Timer Starts (1s)

    Note over Redis, RCS: Connection Re-established
    Redis->>RCS: Event: 'ready'

    alt is lostConnection == true
        Note right of RCS: Recovery Path Triggered
        RCS->>RCS: internalRecover()
        RCS->>Emitter: emit('connection-recovered')
        RCS->>RCS: Set lostConnection = false
    else CHANGED: Previous Race Condition (if ready fired before debounce)
        Note right of RCS: Recovery skipped (Bug fixed)
    end

    Note over Emitter: Debounce Timer Expires
    Emitter-->>RCS: Listener: 'connection-lost' fires
    
    rect rgb(240, 240, 240)
        Note right of RCS: CHANGED: Only logs warning now
        RCS->>RCS: logger.warn(...)
    end
Loading

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

Performance Comparison

Comparing currentlatest master14-day baseline

docker-stats

Metric Current Latest Master Baseline (avg) vs Master vs Baseline Status
docker-image-size-n8n 1269.76 MB 1269.76 MB 1269.76 MB (σ 0.00) +0.0% +0.0%
docker-image-size-runners 386.00 MB 386.00 MB 388.00 MB (σ 3.46) +0.0% -0.5%

Memory consumption baseline with starter plan resources

Metric Current Latest Master Baseline (avg) vs Master vs Baseline Status
memory-rss-baseline 287.07 MB 287.07 MB 281.78 MB (σ 34.50) +0.0% +1.9%
memory-heap-used-baseline 114.53 MB 114.53 MB 113.09 MB (σ 1.15) +0.0% +1.3% ⚠️

Idle baseline with Instance AI module loaded

Metric Current Latest Master Baseline (avg) vs Master vs Baseline Status
instance-ai-heap-used-baseline 186.41 MB 186.41 MB 186.41 MB (< 3 samples) +0.0% +0.0%
instance-ai-rss-baseline 343.76 MB 343.76 MB 343.76 MB (< 3 samples) +0.0% -0.0%
How to read this table
  • Current: This PR's value (or latest master if PR perf tests haven't run)
  • Latest Master: Most recent nightly master measurement
  • Baseline: Rolling 14-day average from master
  • vs Master: PR impact (current vs latest master)
  • vs Baseline: Drift from baseline (current vs rolling avg)
  • Status: ✅ within 1σ | ⚠️ 1-2σ | 🔴 >2σ regression

@ivov ivov requested a review from despairblue April 9, 2026 14:21
@n8n-assistant n8n-assistant Bot added core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team labels Apr 9, 2026
@ivov ivov requested a review from a team April 23, 2026 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Redis connection sometimes not recovered

1 participant