fix: address concurrency races in stats aggregation#16
Merged
Conversation
- getStat: use cached race winner instead of orphaned instance (fixes silent counter loss); correct the broken if/else block - startup: init timezone + stats map before setStartedService(true) to avoid NPE race; make init() idempotent and synchronized - StatReaderService: use ThreadLocal NumberFormat (not thread-safe across concurrent REST threads) - mark lifecycle flags volatile (startedService, startedPurge, finished)
This was referenced Jun 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes several concurrency / correctness issues found during a review of the stat ingest + aggregation path. Scoped to the low-risk, high-value set; compiles cleanly (
./mvnw clean compile).Fixes
1. Lost-update race in
StatBuilderService.getStat(data loss)When two threads created the same
statClass/entity/granularity/tsbucket concurrently, the race loser returned its own orphanedStatinstance (never put in the cache) instead of the winner already in the map. Its increments were silently dropped at flush. Also the} {block was an unconditional block, not anelse, so the "Not CREATED" warning always fired. Now the loser reuses the cachedraceStat.2. Startup ordering NPE race in
StatController.onStartsetStartedService(true)ran synchronously whilestatService.init(tz)(which allocates the staticstatsmap) ran asynchronously insidestartPurgeTask(). A consumer could observestartedService==trueand callgetStat()whilestatswas stillnull→ NPE.onStartnow resolves the timezone and callsinit(tz)beforesetStartedService(true).init()is nowsynchronizedand idempotent (only allocates if null) so the later call from the purge task is harmless.3.
NumberFormatthread-safety inStatReaderServiceNumberFormatis not thread-safe and was a sharedstaticused by concurrent/stats/viewand/stats/comparerequests. Replaced with aThreadLocal<NumberFormat>.4. Missing
volatileon cross-thread lifecycle flagsstartedService(StatBuilderService),startedPurgeandfinished(StatController) are written/read across threads outside synchronized blocks;finishedis spun on during shutdown. Markedvolatileto guarantee visibility (fixes a potential shutdown hang).Not included (tracked for a separate, more careful change)
EntityManagerused across consumer/flush/reader threads.removeLockObj.flushStats(statClass, entityId)not holding the per-stat lock.statNPE swallowed influshStats.StatServerResource.computeGranularitydereferencingfrom/tobefore the null checks (returns 500 instead of 400).Test plan
./mvnw clean compilepasses.