Skip to content

Commit 0cdc937

Browse files
authored
🤖 feat: support all-namespaces LIST across multiple CoderControlPlane instances (#85)
## Summary Enable all-namespaces LIST to aggregate results across every eligible `CoderControlPlane` instance (one per namespace) for `CoderTemplate` and `CoderWorkspace` resources. Previously, `kubectl get codertemplates -A` failed with: > multiple eligible CoderControlPlane instances across namespaces; multi-instance support is planned ## Background The aggregated API server's storage assumed exactly one eligible `CoderControlPlane` when handling all-namespaces LIST (request namespace is empty). When multiple eligible control planes exist across namespaces, client list-watch loops fail, blocking `kubectl get codertemplates -A`, `kubectl get coderworkspaces -A`, and controllers/informers that use list+watch. ## Implementation **New interface** — `coder.NamespaceLister` with `EligibleNamespaces(ctx) ([]string, error)`: - Opt-in capability that lets storage enumerate namespaces a provider can serve. - Keeps storage decoupled from concrete provider types. **Provider implementations**: - `ControlPlaneClientProvider.EligibleNamespaces` — discovers eligible CPs via `findEligibleControlPlanes`, groups by namespace, rejects duplicates within a namespace, returns sorted namespace list. - `StaticClientProvider.EligibleNamespaces` — returns the pinned namespace. **Storage fan-out** in `TemplateStorage.List` and `WorkspaceStorage.List`: - When request namespace is empty and provider implements `NamespaceLister`: fan out across all eligible namespaces, query each, convert with correct per-namespace metadata, and return aggregated results sorted by `(namespace, name)`. - Namespaced requests and providers without `NamespaceLister` keep existing behavior unchanged. - Error semantics: fail-fast (if any namespace fails, return error immediately). ## Validation - `make verify-vendor` ✅ - `make test` ✅ (8 new tests + no regressions) - `make build` ✅ - `make lint` ✅ ## Risks Low risk — the fan-out path only activates for all-namespaces LIST when the provider supports `NamespaceLister`. Single-namespace requests and the static provider path are unchanged. Multiple eligible CPs within the same namespace are still explicitly rejected. --- <details> <summary>📋 Implementation Plan</summary> # Plan: Support querying multiple CoderControlPlane instances (multi-namespace aggregation) ## Context / Why Today, the aggregated API server’s `CoderTemplate` / `CoderWorkspace` storage assumes **exactly one** eligible `CoderControlPlane` when handling **all-namespaces LIST** (i.e. request namespace is empty). When multiple eligible control planes exist across namespaces, client list-watch loops fail with: - `multiple eligible CoderControlPlane instances across namespaces; multi-instance support is planned` This blocks workflows like: - `kubectl get codertemplates -A` - controllers/informers that use list+watch against `aggregation.coder.com/v1alpha1` ## Goal Enable **multi-instance querying** by making **all-namespaces LIST** aggregate results across every eligible `CoderControlPlane` (one per namespace) for: - `aggregation.coder.com/v1alpha1/CoderTemplate` - `aggregation.coder.com/v1alpha1/CoderWorkspace` Namespaced requests (`-n <namespace>`) must keep current behavior. ## Non-goals (v1) - Supporting **multiple eligible** `CoderControlPlane` objects **within the same Kubernetes namespace**. - Implementing a “true” upstream-backed watch (watch events that reflect changes that occur directly in Coder without going through the aggregated API server). ## Acceptance Criteria - `kubectl get codertemplates -A` returns templates from **all** eligible `CoderControlPlane` namespaces. - `kubectl get coderworkspaces -A` returns workspaces from **all** eligible `CoderControlPlane` namespaces. - `kubectl get codertemplates -A --watch` no longer fails due to the multi-instance discovery error (because the initial LIST succeeds). - Standalone `aggregated-apiserver` mode (static provider pinned to `--coder-namespace`) continues working unchanged. ## Evidence (code references) - **Single-instance guard + error message**: `internal/aggregated/coder/controlplane_provider.go` - `ClientForNamespace` errors on `len(eligible) > 1` - `DefaultNamespace` errors on multiple eligible across namespaces - `multipleEligibleControlPlaneMessage("")` returns the exact string shown in the screenshot - **LIST path uses default namespace + empty-namespace client resolution**: - `internal/aggregated/storage/template.go`: `(*TemplateStorage).List` - `internal/aggregated/storage/workspace.go`: `(*WorkspaceStorage).List` - both call `namespaceForListConversion(...)` and then `clientForNamespace(ctx, requestNamespace)` where `requestNamespace==""` triggers the provider’s “pick exactly one CP” logic - **WATCH path is an in-memory broadcaster**: `internal/aggregated/storage/watch.go`, `template.go`, `workspace.go` - the screenshot’s “Watcher failed …” is consistent with list-watch clients failing during the initial LIST phase --- ## Implementation Details ### 1) Add an optional provider capability: list eligible namespaces **File:** `internal/aggregated/coder/provider.go` Add a small, opt-in interface that lets storage enumerate the set of namespaces it can serve. ```go // NamespaceLister can enumerate namespaces served by a ClientProvider. // Used to implement all-namespaces LIST by fanning out across instances. // // Implementations should only return namespaces that are eligible/ready. // Returned namespaces must be non-empty and should be deterministic (sorted). type NamespaceLister interface { EligibleNamespaces(ctx context.Context) ([]string, error) } ``` Rationale: keeps storage decoupled from concrete provider types (`StaticClientProvider` vs `ControlPlaneClientProvider`). ### 2) Implement `NamespaceLister` #### 2a) Dynamic control-plane provider **File:** `internal/aggregated/coder/controlplane_provider.go` Implement: - `func (p *ControlPlaneClientProvider) EligibleNamespaces(ctx context.Context) ([]string, error)` Algorithm: 1. `eligibleCPs, err := p.findEligibleControlPlanes(ctx, "")` 2. If `len(eligibleCPs) == 0`: return `ServiceUnavailable(noEligibleControlPlaneMessage(""))`. 3. Group eligible CPs by `cp.Namespace`. 4. If any namespace has >1 eligible CP: return `BadRequest(multipleEligibleControlPlaneMessage(namespace))` (still not supported). 5. Return the set of namespaces, **sorted** for determinism. Defensive programming: - Assert/validate non-nil receiver and non-nil context. - Ensure returned namespaces are not empty; crash/assert if an eligible CP has an empty namespace (should be impossible). #### 2b) Static provider **File:** `internal/aggregated/coder/provider.go` Implement: - `func (p *StaticClientProvider) EligibleNamespaces(ctx context.Context) ([]string, error)` Behavior: - If `p.Namespace == ""`: return `ServiceUnavailable("static provider has no default namespace")` (consistent with existing behavior). - Else return `[]string{p.Namespace}`. ### 3) Update storage LIST to fan out when request namespace is empty #### 3a) Templates **File:** `internal/aggregated/storage/template.go` Update `func (s *TemplateStorage) List(ctx context.Context, _ *metainternalversion.ListOptions) ...`: - If request namespace is non-empty: keep existing behavior. - If request namespace is empty: - If `s.provider` implements `coder.NamespaceLister`: - `namespaces := lister.EligibleNamespaces(ctx)` - For each namespace: - `sdk := s.clientForNamespace(ctx, namespace)` - `templates := sdk.Templates(ctx, codersdk.TemplateFilter{})` - Convert each template using `convert.TemplateToK8s(namespace, template)` and append. - Sort `list.Items` by `(namespace, name)` for deterministic output. - Else: fallback to existing `namespaceForListConversion` + `clientForNamespace(ctx, "")` behavior. #### 3b) Workspaces **File:** `internal/aggregated/storage/workspace.go` Update `func (s *WorkspaceStorage) List(ctx context.Context, _ *metainternalversion.ListOptions) ...` with the same fan-out strategy. Concurrency (recommended, not required for correctness): - Use an `errgroup.Group` + a small semaphore/limit (e.g. 4) so large numbers of namespaces don’t produce N sequential slow requests. Error semantics (v1): - Fail-fast: if any namespace fails to resolve a client or list objects, return an error (preserves today’s “LIST is authoritative” behavior). ### 4) Tests #### 4a) Provider unit tests **File:** `internal/aggregated/coder/controlplane_provider_test.go` Add tests for `EligibleNamespaces`: - Returns namespaces for multiple eligible CPs across namespaces. - Returns `BadRequest` when a single namespace has multiple eligible CPs. - Returns `ServiceUnavailable` when no eligible CPs exist. #### 4b) Storage aggregation tests **File:** `internal/aggregated/storage/storage_test.go` Add tests verifying all-namespaces aggregation: - Create two mock Coder servers with distinct seeded templates/workspaces. - Create a test provider implementing: - `coder.ClientProvider` (map namespace → client) - `coder.NamespaceLister` (returns both namespaces) - Assert: - `TemplateStorage.List(context.Background(), nil)` returns items from both namespaces and each item has the correct `metadata.namespace`. - same for `WorkspaceStorage.List`. ### 5) Docs / behavior notes **File:** `internal/aggregated/storage/doc.go` Update the “v1 semantics” comment to note: - all-namespaces LIST aggregates across eligible `CoderControlPlane` namespaces when the provider supports it. ### 6) Validation Run locally after implementation: - `make test` - `make build` - `make lint` - `make verify-vendor` (expected no-op; only if deps are added) --- <details> <summary>Optional follow-ups (not required to fix the observed error)</summary> 1) **Upstream-backed watch**: implement per-control-plane polling/websocket to generate watch events even when changes happen directly in Coder. 2) **Partial failure mode**: consider returning partial results for all-namespaces LIST when one instance is down (would require a clear API/UX decision; Kubernetes APIs generally favor fail-fast). 3) **Performance & caching**: optionally cache per-namespace SDK clients/tokens (with invalidation on secret changes) to reduce per-request secret reads. </details> </details> --- _Generated with `mux` • Model: `anthropic:claude-opus-4-6` • Thinking: `xhigh` • Cost: `$3.28`_ <!-- mux-attribution: model=anthropic:claude-opus-4-6 thinking=xhigh costs=3.28 -->
1 parent 0e64421 commit 0cdc937

8 files changed

Lines changed: 523 additions & 2 deletions

File tree

internal/aggregated/coder/controlplane_provider.go

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ import (
55
"fmt"
66
"log"
77
"net/url"
8+
"sort"
89
"strings"
910
"time"
1011

@@ -26,6 +27,7 @@ type ControlPlaneClientProvider struct {
2627
var (
2728
_ ClientProvider = (*ControlPlaneClientProvider)(nil)
2829
_ NamespaceResolver = (*ControlPlaneClientProvider)(nil)
30+
_ NamespaceLister = (*ControlPlaneClientProvider)(nil)
2931
)
3032

3133
// NewControlPlaneClientProvider constructs a dynamic ClientProvider backed by CoderControlPlane resources.
@@ -206,6 +208,45 @@ func (p *ControlPlaneClientProvider) DefaultNamespace(ctx context.Context) (stri
206208
}
207209
}
208210

211+
// EligibleNamespaces returns namespaces served by eligible CoderControlPlane instances.
212+
func (p *ControlPlaneClientProvider) EligibleNamespaces(ctx context.Context) ([]string, error) {
213+
if p == nil {
214+
return nil, fmt.Errorf("assertion failed: control plane client provider must not be nil")
215+
}
216+
if ctx == nil {
217+
return nil, fmt.Errorf("assertion failed: context must not be nil")
218+
}
219+
220+
eligible, err := p.findEligibleControlPlanes(ctx, "")
221+
if err != nil {
222+
return nil, err
223+
}
224+
if len(eligible) == 0 {
225+
return nil, apierrors.NewServiceUnavailable(noEligibleControlPlaneMessage(""))
226+
}
227+
228+
byNamespace := make(map[string]int, len(eligible))
229+
for i := range eligible {
230+
namespace := strings.TrimSpace(eligible[i].Namespace)
231+
if namespace == "" {
232+
return nil, fmt.Errorf("assertion failed: eligible CoderControlPlane namespace must not be empty")
233+
}
234+
byNamespace[namespace]++
235+
}
236+
237+
namespaces := make([]string, 0, len(byNamespace))
238+
for namespace, count := range byNamespace {
239+
if count > 1 {
240+
return nil, apierrors.NewBadRequest(multipleEligibleControlPlaneMessage(namespace))
241+
}
242+
namespaces = append(namespaces, namespace)
243+
}
244+
245+
sort.Strings(namespaces)
246+
247+
return namespaces, nil
248+
}
249+
209250
func (p *ControlPlaneClientProvider) findEligibleControlPlanes(
210251
ctx context.Context,
211252
namespace string,

internal/aggregated/coder/controlplane_provider_test.go

Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -266,6 +266,161 @@ func TestControlPlaneClientProviderDefaultNamespaceReturnsBadRequestForMultipleE
266266
}
267267
}
268268

269+
func TestControlPlaneClientProviderEligibleNamespacesMultipleNamespaces(t *testing.T) {
270+
t.Parallel()
271+
272+
ignoredControlPlane := eligibleControlPlane("team-a", "coder-disabled")
273+
ignoredControlPlane.Spec.OperatorAccess.Disabled = true
274+
275+
provider, secretReader := newControlPlaneProviderForTest(
276+
t,
277+
[]coderv1alpha1.CoderControlPlane{
278+
eligibleControlPlane("team-a", "coder-a"),
279+
eligibleControlPlane("team-b", "coder-b"),
280+
ignoredControlPlane,
281+
},
282+
nil,
283+
)
284+
285+
namespaces, err := provider.EligibleNamespaces(context.Background())
286+
if err != nil {
287+
t.Fatalf("resolve eligible namespaces: %v", err)
288+
}
289+
if got, want := secretReader.getCalls, 0; got != want {
290+
t.Fatalf("expected %d secret reads, got %d", want, got)
291+
}
292+
if got, want := len(namespaces), 2; got != want {
293+
t.Fatalf("expected %d namespaces, got %d: %v", want, got, namespaces)
294+
}
295+
if got, want := namespaces[0], "team-a"; got != want {
296+
t.Fatalf("expected first namespace %q, got %q", want, got)
297+
}
298+
if got, want := namespaces[1], "team-b"; got != want {
299+
t.Fatalf("expected second namespace %q, got %q", want, got)
300+
}
301+
}
302+
303+
func TestControlPlaneClientProviderEligibleNamespacesNoEligible(t *testing.T) {
304+
t.Parallel()
305+
306+
provider, secretReader := newControlPlaneProviderForTest(t, nil, nil)
307+
308+
namespaces, err := provider.EligibleNamespaces(context.Background())
309+
if err == nil {
310+
t.Fatal("expected error")
311+
}
312+
if namespaces != nil {
313+
t.Fatalf("expected nil namespaces on error, got %v", namespaces)
314+
}
315+
if got, want := secretReader.getCalls, 0; got != want {
316+
t.Fatalf("expected %d secret reads, got %d", want, got)
317+
}
318+
if !apierrors.IsServiceUnavailable(err) {
319+
t.Fatalf("expected ServiceUnavailable, got %v", err)
320+
}
321+
if !strings.Contains(err.Error(), "no eligible CoderControlPlane") {
322+
t.Fatalf("expected no-eligible message, got %v", err)
323+
}
324+
}
325+
326+
func TestControlPlaneClientProviderEligibleNamespacesDuplicateInNamespace(t *testing.T) {
327+
t.Parallel()
328+
329+
provider, secretReader := newControlPlaneProviderForTest(
330+
t,
331+
[]coderv1alpha1.CoderControlPlane{
332+
eligibleControlPlane("team-a", "coder-a"),
333+
eligibleControlPlane("team-a", "coder-b"),
334+
},
335+
nil,
336+
)
337+
338+
namespaces, err := provider.EligibleNamespaces(context.Background())
339+
if err == nil {
340+
t.Fatal("expected error")
341+
}
342+
if namespaces != nil {
343+
t.Fatalf("expected nil namespaces on error, got %v", namespaces)
344+
}
345+
if got, want := secretReader.getCalls, 0; got != want {
346+
t.Fatalf("expected %d secret reads, got %d", want, got)
347+
}
348+
if !apierrors.IsBadRequest(err) {
349+
t.Fatalf("expected BadRequest, got %v", err)
350+
}
351+
if !strings.Contains(err.Error(), "multi-instance support is planned") {
352+
t.Fatalf("expected multi-instance message, got %v", err)
353+
}
354+
}
355+
356+
func TestControlPlaneClientProviderEligibleNamespacesSingleNamespace(t *testing.T) {
357+
t.Parallel()
358+
359+
provider, secretReader := newControlPlaneProviderForTest(
360+
t,
361+
[]coderv1alpha1.CoderControlPlane{eligibleControlPlane("team-a", "coder-a")},
362+
nil,
363+
)
364+
365+
namespaces, err := provider.EligibleNamespaces(context.Background())
366+
if err != nil {
367+
t.Fatalf("resolve eligible namespaces: %v", err)
368+
}
369+
if got, want := secretReader.getCalls, 0; got != want {
370+
t.Fatalf("expected %d secret reads, got %d", want, got)
371+
}
372+
if got, want := len(namespaces), 1; got != want {
373+
t.Fatalf("expected %d namespace, got %d", want, got)
374+
}
375+
if got, want := namespaces[0], "team-a"; got != want {
376+
t.Fatalf("expected namespace %q, got %q", want, got)
377+
}
378+
}
379+
380+
func TestControlPlaneClientProviderEligibleNamespacesAssertions(t *testing.T) {
381+
t.Parallel()
382+
383+
provider, _ := newControlPlaneProviderForTest(t, nil, nil)
384+
385+
tests := []struct {
386+
name string
387+
provider *ControlPlaneClientProvider
388+
ctx context.Context
389+
wantErrContains string
390+
}{
391+
{
392+
name: "rejects nil provider",
393+
provider: nil,
394+
ctx: context.Background(),
395+
wantErrContains: "assertion failed: control plane client provider must not be nil",
396+
},
397+
{
398+
name: "rejects nil context",
399+
provider: provider,
400+
ctx: nil,
401+
wantErrContains: "assertion failed: context must not be nil",
402+
},
403+
}
404+
405+
for _, testCase := range tests {
406+
testCase := testCase
407+
t.Run(testCase.name, func(t *testing.T) {
408+
t.Parallel()
409+
410+
namespaces, err := testCase.provider.EligibleNamespaces(testCase.ctx)
411+
if err == nil {
412+
t.Fatalf("expected error containing %q, got nil", testCase.wantErrContains)
413+
}
414+
if namespaces != nil {
415+
t.Fatalf("expected nil namespaces on error, got %v", namespaces)
416+
}
417+
if !strings.Contains(err.Error(), testCase.wantErrContains) {
418+
t.Fatalf("expected error containing %q, got %q", testCase.wantErrContains, err.Error())
419+
}
420+
})
421+
}
422+
}
423+
269424
func TestControlPlaneClientProviderClientForNamespaceDefaultsSecretKeyToToken(t *testing.T) {
270425
t.Parallel()
271426

internal/aggregated/coder/provider.go

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,12 @@ type NamespaceResolver interface {
2222
DefaultNamespace(ctx context.Context) (string, error)
2323
}
2424

25+
// NamespaceLister can enumerate namespaces served by a ClientProvider.
26+
// Used to implement all-namespaces LIST by fanning out across instances.
27+
type NamespaceLister interface {
28+
EligibleNamespaces(ctx context.Context) ([]string, error)
29+
}
30+
2531
// StaticClientProvider returns one static client, optionally restricted to one namespace.
2632
type StaticClientProvider struct {
2733
Client *codersdk.Client
@@ -31,6 +37,7 @@ type StaticClientProvider struct {
3137
var (
3238
_ ClientProvider = (*StaticClientProvider)(nil)
3339
_ NamespaceResolver = (*StaticClientProvider)(nil)
40+
_ NamespaceLister = (*StaticClientProvider)(nil)
3441
)
3542

3643
// ClientForNamespace returns the static client.
@@ -77,6 +84,18 @@ func (p *StaticClientProvider) DefaultNamespace(_ context.Context) (string, erro
7784
return p.Namespace, nil
7885
}
7986

87+
// EligibleNamespaces returns namespaces served by the static provider.
88+
func (p *StaticClientProvider) EligibleNamespaces(_ context.Context) ([]string, error) {
89+
if p == nil {
90+
return nil, fmt.Errorf("assertion failed: static client provider must not be nil")
91+
}
92+
if p.Namespace == "" {
93+
return nil, apierrors.NewServiceUnavailable("static provider has no default namespace")
94+
}
95+
96+
return []string{p.Namespace}, nil
97+
}
98+
8099
// NewStaticClientProvider creates a StaticClientProvider from cfg and optional namespace restriction.
81100
func NewStaticClientProvider(cfg Config, namespace string) (*StaticClientProvider, error) {
82101
client, err := NewSDKClient(cfg)

internal/aggregated/coder/provider_test.go

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,58 @@ func TestStaticClientProviderDefaultNamespaceAssertions(t *testing.T) {
209209
}
210210
}
211211

212+
func TestStaticClientProviderEligibleNamespaces(t *testing.T) {
213+
t.Parallel()
214+
215+
provider := &StaticClientProvider{Namespace: "control-plane"}
216+
namespaces, err := provider.EligibleNamespaces(context.Background())
217+
if err != nil {
218+
t.Fatalf("resolve eligible namespaces: %v", err)
219+
}
220+
if got, want := len(namespaces), 1; got != want {
221+
t.Fatalf("expected %d namespace, got %d", want, got)
222+
}
223+
if got, want := namespaces[0], "control-plane"; got != want {
224+
t.Fatalf("expected namespace %q, got %q", want, got)
225+
}
226+
}
227+
228+
func TestStaticClientProviderEligibleNamespacesNoNamespace(t *testing.T) {
229+
t.Parallel()
230+
231+
provider := &StaticClientProvider{}
232+
namespaces, err := provider.EligibleNamespaces(context.Background())
233+
if err == nil {
234+
t.Fatal("expected error")
235+
}
236+
if namespaces != nil {
237+
t.Fatalf("expected nil namespaces on error, got %v", namespaces)
238+
}
239+
if !apierrors.IsServiceUnavailable(err) {
240+
t.Fatalf("expected ServiceUnavailable, got %v", err)
241+
}
242+
if !strings.Contains(err.Error(), "static provider has no default namespace") {
243+
t.Fatalf("expected missing namespace message, got %v", err)
244+
}
245+
}
246+
247+
func TestStaticClientProviderEligibleNamespacesNilReceiver(t *testing.T) {
248+
t.Parallel()
249+
250+
var provider *StaticClientProvider
251+
namespaces, err := provider.EligibleNamespaces(context.Background())
252+
if err == nil {
253+
t.Fatal("expected error")
254+
}
255+
if namespaces != nil {
256+
t.Fatalf("expected nil namespaces on error, got %v", namespaces)
257+
}
258+
wantErrContains := "assertion failed: static client provider must not be nil"
259+
if !strings.Contains(err.Error(), wantErrContains) {
260+
t.Fatalf("expected error containing %q, got %q", wantErrContains, err.Error())
261+
}
262+
}
263+
212264
func TestNewStaticClientProvider(t *testing.T) {
213265
t.Parallel()
214266

internal/aggregated/storage/doc.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,6 @@
99
// while Kubernetes object names allow dots (DNS-1123 subdomains).
1010
// - A single admin session token is used for all API calls (no per-request impersonation in v1).
1111
// - Storage resolves the backing codersdk.Client via a ClientProvider interface.
12+
// - All-namespaces LIST aggregates results across eligible CoderControlPlane namespaces
13+
// when the provider implements NamespaceLister.
1214
package storage

0 commit comments

Comments
 (0)