Skip to content

Add IsEphemeral() property to Partition interface#9900

Open
rkannan82 wants to merge 3 commits intomainfrom
kannan/ephemeral-partition-property
Open

Add IsEphemeral() property to Partition interface#9900
rkannan82 wants to merge 3 commits intomainfrom
kannan/ephemeral-partition-property

Conversation

@rkannan82
Copy link
Copy Markdown
Contributor

What

Add IsEphemeral() method to the Partition interface and replace Kind() == STICKY checks where the behavior applies to all ephemeral partition types, not just sticky queues.

Why

Several Kind() == STICKY checks in the matching service guard behaviors that apply to any ephemeral/transient partition (TTL-based expiry, no fairness, no versioning, no multi-partition fan-out, no enhanced describe). Extracting this into a property method makes it easy to add new ephemeral partition types (e.g. worker-commands) without auditing every check. Sticky-specific checks (stickiness clearing, deployment pinning, versioning in Poll) are left unchanged.

How did you test it?

Unit tests (go test ./service/matching/...) and full repo build (go build ./...). No behavioral changes — pure refactor replacing type checks with a semantic property.

🤖 Generated with Claude Code

Replace Kind() == STICKY checks with IsEphemeral() where the behavior
applies to all ephemeral partition types (sticky, and future types like
worker-commands), not just sticky queues specifically.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rkannan82 rkannan82 force-pushed the kannan/ephemeral-partition-property branch 3 times, most recently from 9f1f466 to 526bf30 Compare April 10, 2026 04:22
@rkannan82 rkannan82 requested review from ShahabT and dnr April 10, 2026 04:23
@rkannan82 rkannan82 force-pushed the kannan/ephemeral-partition-property branch from 526bf30 to a163283 Compare April 10, 2026 04:25
@rkannan82 rkannan82 force-pushed the kannan/ephemeral-partition-property branch from a163283 to 2b99d6a Compare April 10, 2026 04:54
…ants

These are used in IsEphemeral() code paths that apply to all ephemeral
partition types, not just sticky queues.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rkannan82 rkannan82 force-pushed the kannan/ephemeral-partition-property branch from 2b99d6a to 8e2ff4f Compare April 10, 2026 17:22
@dnr
Copy link
Copy Markdown
Contributor

dnr commented Apr 13, 2026

Several Kind() == STICKY checks in the matching service guard behaviors that apply to any ephemeral/transient partition

I think I'd like to split this up further:

TTL-based expiry

The goal is we want to use metadata TTLs for any queue that has the property: if the queue is idle (no poller or tasks) for 48 hours, it's okay to drop all tasks in it. Sticky fits that description, as do worker command queues. I think basically any worker-specific task queue would, right?

no fairness

This one feels a little different.. for more "scheduling"/"placement" approaches to matching where we send a high volume of tasks over a worker-specific channel, we may definitely want fairness there even though it's a worker-specific queue. (The problem with fairness + TTLs is for tasks, it doesn't apply to metadata. Though yeah, maybe it's silly to TTL metadata without TTLs on tasks. We'd eventually want a scavenger.)

no versioning

I'm not sure here, I guess by definition worker-specific doesn't need versioning. But maybe some other kind of "ephemeral" queues could be versioned?

no multi-partition fan-out

Also potentially different.. in theory we could have a high volume worker-specific queue accepting/dispatching tasks enough that want to spread them over a few partitions. After dynamic partition scaling maybe we should relax this for all queues, including sticky, and just default to 1 but let them scale?

no enhanced describe

I think this mostly has to do with versioning?

More behavior differences:

Sticky queues get userdata (and thus config) from a normal "parent". Should other ephemeral queues do that too?

"Return error on Add*Task if no poller in last 10s". That could apply to worker commands, maybe with a different timeout though? 5m?

Priority backlog forwarding: this is tied to having a normal "parent" partition(s), should probably match that.

Migration to v2 table: this is tied to fairness. Can be relaxed eventually.

So we could have (hand-waving draft):

SupportsFairness() bool
SupportsPartitions() bool
SupportsVersioning() bool
NormalParent() (string, bool)
RejectTaskIfNoPollerInterval() time.Duration // 0 means never reject
DeleteIfIdleTimeout() time.Duration // 0 means never
OverrideNameInMetrics() string // "" means use name as-is

@dnr
Copy link
Copy Markdown
Contributor

dnr commented Apr 13, 2026

Oh, and the original one that started this:

Appears in metric labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants