Skip to content

fix(analytics): guard polling start so concurrent callers don't leak executors (SDK-81)#91

Open
tylerjroach wants to merge 2 commits into
masterfrom
fix/sdk-85-thread-safe-polling-start
Open

fix(analytics): guard polling start so concurrent callers don't leak executors (SDK-81)#91
tylerjroach wants to merge 2 commits into
masterfrom
fix/sdk-85-thread-safe-polling-start

Conversation

@tylerjroach

@tylerjroach tylerjroach commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Summary

startPollingForDefinitions had no lock around the create-and-assign of pollingExecutor. Two concurrent callers would each allocate a fresh ScheduledExecutorService, the field would point at the second, and the first executor's worker thread + scheduled task would silently leak (still alive, still polling, no longer reachable for shutdown).

Add a private pollingLock guarding the executor's lifecycle:

  • startPollingForDefinitions: under the lock, bail early if pollingExecutor != null (idempotent), otherwise create + schedule.
  • stopPollingForDefinitions: snapshot the executor under the lock and clear the field, then run shutdown / awaitTermination outside the lock so a long shutdown can't block a concurrent start for 5s.

Context

Linear: SDK-81. Same audit-driven cross-SDK push; Java is one of three affected (Java, Ruby, Node). Python uses a not self._polling_task check that's safe because both async and sync paths share it under the GIL; Go uses CompareAndSwap.

Test plan

  • All 54 LocalFlagsProviderTest tests pass (mvn test)
  • New testConcurrentStartPollingDoesNotLeakExecutors spins up 8 contender threads behind a CountDownLatch start gate. Counts JVM threads named mixpanel-flags-poller before and after; asserts exactly one new poller exists, then asserts stop cleans it up. Before this fix the count would be 8.

🤖 Generated with Claude Code

…utors

startPollingForDefinitions had no lock around the pollingExecutor
create-and-assign. Two concurrent callers would each allocate a fresh
ScheduledExecutorService and the earlier one's worker thread + queue
would leak (still alive, still scheduled to poll, no way to shut it
down because the field had been overwritten).

Guard the executor lifecycle with a private lock and bail early if a
poller is already scheduled. Snapshot the executor under the lock in
stop, then run the (potentially blocking) shutdown / awaitTermination
outside the lock so a long-running shutdown can't block a concurrent
start for 5s.

Linear: SDK-85

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@tylerjroach tylerjroach requested review from a team and rahul-mixpanel June 29, 2026 17:31
@linear-code

linear-code Bot commented Jun 29, 2026

Copy link
Copy Markdown

SDK-85

SDK-81

@tylerjroach tylerjroach changed the title fix(flags): guard polling start so concurrent callers don't leak executors (SDK-85) fix(analytics): guard polling start so concurrent callers don't leak executors (SDK-85) Jun 29, 2026
@tylerjroach tylerjroach changed the title fix(analytics): guard polling start so concurrent callers don't leak executors (SDK-85) fix(analytics): guard polling start so concurrent callers don't leak executors (SDK-81) Jun 29, 2026
@tylerjroach

Copy link
Copy Markdown
Contributor Author

@greptileai

@greptile-apps

greptile-apps Bot commented Jul 2, 2026

Copy link
Copy Markdown

Confidence Score: 5/5

Safe to merge — the lock/snapshot pattern is correct, shutdown happens outside the lock, and the fix is well-scoped to the executor lifecycle.

The production code change is straightforward and mechanically correct: idempotent create under lock, snapshot-clear-then-shutdown pattern for stop. No pre-existing contract is broken and no new race conditions are introduced.

No files require special attention.

Important Files Changed

Filename Overview
src/main/java/com/mixpanel/mixpanelapi/featureflags/provider/LocalFlagsProvider.java Adds pollingLock to guard executor lifecycle; start is now idempotent under the lock, stop snapshots-and-clears under the lock then shuts down outside it — correct pattern.
src/test/java/com/mixpanel/mixpanelapi/featureflags/provider/LocalFlagsProviderTest.java Adds a concurrency regression test using a CountDownLatch start gate; thread-count assertion after stopPollingForDefinitions relies on a 100 ms sleep which is a fragile timing heuristic.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant T1 as Thread 1
    participant T2 as Thread 2
    participant Lock as pollingLock
    participant Exec as ScheduledExecutorService

    T1->>T1: fetchDefinitions()
    T2->>T2: fetchDefinitions()
    T1->>Lock: synchronized(pollingLock)
    Note over Lock: T2 waits for lock
    T1->>T1: "pollingExecutor == null? true"
    T1->>Exec: create + scheduleAtFixedRate
    T1->>Lock: release lock
    T2->>Lock: synchronized(pollingLock)
    T2->>T2: "pollingExecutor != null, return early"
    T2->>Lock: release lock
    Note over T1,Exec: Only one executor created
    T1->>Lock: stopPollingForDefinitions: synchronized(pollingLock)
    T1->>T1: "snapshot toShutdown, pollingExecutor = null"
    T1->>Lock: release lock
    T1->>Exec: toShutdown.shutdown() + awaitTermination
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant T1 as Thread 1
    participant T2 as Thread 2
    participant Lock as pollingLock
    participant Exec as ScheduledExecutorService

    T1->>T1: fetchDefinitions()
    T2->>T2: fetchDefinitions()
    T1->>Lock: synchronized(pollingLock)
    Note over Lock: T2 waits for lock
    T1->>T1: "pollingExecutor == null? true"
    T1->>Exec: create + scheduleAtFixedRate
    T1->>Lock: release lock
    T2->>Lock: synchronized(pollingLock)
    T2->>T2: "pollingExecutor != null, return early"
    T2->>Lock: release lock
    Note over T1,Exec: Only one executor created
    T1->>Lock: stopPollingForDefinitions: synchronized(pollingLock)
    T1->>T1: "snapshot toShutdown, pollingExecutor = null"
    T1->>Lock: release lock
    T1->>Exec: toShutdown.shutdown() + awaitTermination
Loading

Reviews (2): Last reviewed commit: "test(flags): fix region comment issue ke..." | Re-trigger Greptile

Copy-paste typo — the PR, commit, and Linear link all reference
SDK-81; only the test region comment said SDK-85.
@tylerjroach

Copy link
Copy Markdown
Contributor Author

Pushed 6116ece — region comment now reads SDK-81.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant