Fix NPE in ChannelPool metrics methods when pool is uninitialized (CUSTOMER-413)#928
Fix NPE in ChannelPool metrics methods when pool is uninitialized (CUSTOMER-413)#928nikagra wants to merge 1 commit into
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
When the initial TCP connection to a node fails, ChannelPool.connectFuture completes (so the pool is added to PoolManager.pools) while `channels` is still null — initialize() is only called on the success path. The four methods size(), getAvailableIds(), getInFlight(), and getOrphanedIds() called Arrays.stream(channels) without guarding against this, throwing NullPointerException whenever a Dropwizard Metrics reporter (JMX, Graphite, etc.) scraped the gauge values during the reconnection window. Fix: mirror the pattern already used by next(), which checks singleThreaded.initialized before touching channels. Using the volatile initialized flag is correct by the JMM: the volatile write at line 365 (initialized = true) happens-after all channels writes, so any thread that reads initialized == true is guaranteed to see channels as fully populated. Fixes: CUSTOMER-413
d6dc64c to
250cb9d
Compare
There was a problem hiding this comment.
Pull request overview
Fixes a race where ChannelPool metric accessor methods could throw NullPointerException when a pool instance is published before its channels array is initialized (e.g., after an initial connection failure while reconnecting). This aligns the metric accessors with the existing safety pattern already used by next().
Changes:
- Guard
size(),getAvailableIds(),getInFlight(), andgetOrphanedIds()withsingleThreaded.initializedto safely return0when uninitialized. - Add a regression test ensuring those accessors return
0(and do not throw) whenchannelsis stillnullafter a failed initial connection.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| core/src/main/java/com/datastax/oss/driver/internal/core/pool/ChannelPool.java | Adds initialized guards to prevent NPEs when channels is still null. |
| core/src/test/java/com/datastax/oss/driver/internal/core/pool/ChannelPoolInitTest.java | Adds a regression test covering metric accessor behavior for an uninitialized pool. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
Fixes a
NullPointerExceptionreported by Accertify (CUSTOMER-413 / Zendesk #67222) when Dropwizard Metrics reporters scrape per-node gauge values while the driver is reconnecting to a node.Affects:
com.scylladb:java-driver-core(confirmed in4.18.0.1, same code present inscylla-4.x).Root Cause
ChannelPoolhas aChannelSet[] channelsfield that is not initialized at construction — it starts asnull:channelsis only assigned insideSingleThreaded.initialize(), which is called only when the first channel connection succeeds:When the initial connection fails,
makeInitialConnection()completesconnectFutureimmediately — withchannelsstillnull— and starts a reconnect in the background:PoolManager.onPoolsInit()unconditionally puts every completed pool into thepoolsmap (PoolManager.java:252). The Javadoc comment there even notes: "pool init always succeeds". This is intentional design — but it means a pool withchannels == nullcan be in the map.Why the existing null-check doesn't help
AbstractMetricUpdaterguards against a missing pool entry, but not against an uninitialized pool:The NPE
The three gauge methods — and
size()— callArrays.stream(channels)with no null guard:Arrays.stream(null)throwsNullPointerException. This is triggered by any Dropwizard reporter (JMX, Graphite, Prometheus bridge, etc.) that callsGauge.getValue()during the window between a failed initial connection and a successful reconnect.Full call chain
Same path applies to
getAvailableIds()andgetOrphanedIds().Fix
Mirror the guard already used by
next(), which correctly handles this case via theinitializedflag:Apply the same guard to all four affected methods:
Why
initializedand notchannels != nullinitializedis declaredvolatile(meaning the JMM guarantees that a thread readinginitialized == truewill also observe all writes that preceded the volatile write, includingchannels = new ChannelSet[...]). Thechannelsfield itself is notvolatile, so checking it directly would be a weaker guarantee. Usinginitializedis both JMM-correct and consistent with the existing pattern innext().Why
size()is includedsize()has the identical unguardedArrays.stream(channels)pattern. While it is not currently called from any metrics path, it is a public method and would throw the same NPE if called before initialization.Changed files
core/src/main/java/com/datastax/oss/driver/internal/core/pool/ChannelPool.javasize()— addinitializedguardgetAvailableIds()— addinitializedguardgetInFlight()— addinitializedguardgetOrphanedIds()— addinitializedguard