Dynamic partitioning: client side by dnr · Pull Request #9732 · temporalio/temporal

dnr · 2026-03-28T04:47:46Z

What changed?

Add client-side of dynamic partition scaling: partitionCache, PartitionCounts, StalePartitionCounts error, handlePartitionCounts.

The intended behavior is:

For partition-aware calls (add+poll tasks), the client caches the partition count for active task queues, and uses those partition counts for load balancing (no change to load balancing algorithm yet). The client sends its cached partition counts to the server in a grpc header, and receives updated partition counts with the response in a grpc trailer. If the updated counts are different, it updates its cache it for subsequent requests. If the server indicates that the client's view of partition count is stale, it returns a special StalePartitionCounts error and then client makes one immediate retry with the newly received counts.

With just this PR by itself, the server will never send counts, the cache will always be empty, and the client will always fall back to dynamic config for partition counts, so there's no change in behavior yet.

Why?

Implement half of dynamic partition scaling.

How did you test it?

built
run locally and tested manually
covered by existing tests
added new unit test(s)
added new functional test(s) – tests are in future PRs

client/matching/partition_counts.go

rkannan82 · 2026-04-03T01:04:32Z

client/matching/partition_counts.go

+		logger.Info("partition count trailer parse error", tag.Error(err2))
+		// continue with zero value for pc2
+	}
+	if pc2 != pc {


Should we overwrite it only if err2 = nil? Otherwise we are unnecessarily clearing the cache.

I think it makes sense to clear the cache on parse error. that's a very exceptional situation: it's just a simple proto message between internal services. what could go wrong?

client/matching/partition_counts.go

client/matching/partition_counts_test.go

client/matching/partition_cache.go

rkannan82 · 2026-04-03T01:57:34Z

client/matching/partition_cache.go

@@ -0,0 +1,117 @@
+package matching


What do you think about adding some metrics now itself?
in this file:

cache hit/miss rate

cache size

in partition.go

StalePartitionCounts retry count

StalePartitionCounts errors will show up in service_error_with_type, right?

for the cache, I'm not sure hit/miss makes much sense, it's not a size-limited cache. size does make sense, I can do that.

rkannan82 · 2026-04-03T02:06:27Z

Suggest revising the pr description a bit so we get the gist of the flow also.
Example:
Behavior: Matching RPC sends its cached partition counts to the server and receives updated partition counts as part of the response (grpc header/trailer). If the updated counts are different, then it caches it for subsequent requests. If the server indicates that the client's view of partition count is stale, then client makes one more attempt with the newly received counts.

This new behavior does not take into effect yet since the sever does not send any counts.

dnr added 2 commits March 27, 2026 21:29

Dynamic partitioning: client side

0747f5b

readme

d797c7a

dnr requested a review from rkannan82 March 28, 2026 04:47

dnr requested review from a team as code owners March 28, 2026 04:47

dnr added 2 commits March 27, 2026 21:54

don't log StalePartitionCounts errors

78230b0

lint

14acf34

rkannan82 reviewed Apr 3, 2026

View reviewed changes

dnr added 10 commits April 14, 2026 17:48

comments 1

a2e0f6e

add size metric

da1dcdb

rename and refactor as loop

572854d

move kind check to caller

d3ac959

test trailer parse error

1161f85

merge

bf4bf63

lint

b7ec191

nexus is now partition-aware

27c52f0

lint more

c546e58

oops, fix proto

de4585d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic partitioning: client side#9732

Dynamic partitioning: client side#9732
dnr wants to merge 14 commits intotemporalio:mainfrom
dnr:dp6

dnr commented Mar 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

rkannan82 Apr 3, 2026

Uh oh!

dnr Apr 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rkannan82 Apr 3, 2026 •

edited

Loading

Uh oh!

dnr Apr 15, 2026

Uh oh!

rkannan82 commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dnr commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed?

Why?

How did you test it?

Uh oh!

Uh oh!

rkannan82 Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

dnr Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rkannan82 Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnr Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

rkannan82 commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dnr commented Mar 28, 2026 •

edited

Loading

rkannan82 Apr 3, 2026 •

edited

Loading