Skip to content

docs(networking): Cilium Gateway API — architecture, security, migration#509

Merged
Aleksei Sviridkin (lexfrei) merged 1 commit into
mainfrom
docs/gateway-api-cilium
May 26, 2026
Merged

docs(networking): Cilium Gateway API — architecture, security, migration#509
Aleksei Sviridkin (lexfrei) merged 1 commit into
mainfrom
docs/gateway-api-cilium

Conversation

@lexfrei
Copy link
Copy Markdown
Contributor

@lexfrei Aleksei Sviridkin (lexfrei) commented Apr 23, 2026

What this PR does

Adds a new networking/gateway-api.md page to the next/ docs trunk describing the Cilium-backed Gateway API feature that landed in cozystack/cozystack#2470. Also extends the platform-package reference with the gateway.* and publishing.certificates.dns01.* value rows the same upstream PR introduces.

The page covers, as the feature shipped:

  • a new platform-level toggle (gateway.enabled) and a per-tenant toggle (tenant.spec.gateway) that opts a tenant into its own dedicated Gateway (own LB IP, own per-tenant Issuer, own Certificate);
  • inheritance through the namespace.cozystack.io/gateway label: tenants without spec.gateway: true attach their routes to the nearest ancestor's Gateway via a label-keyed allowedRoutes.namespaces.selector (same shape as _namespace.ingress inheritance);
  • mechanism-agnostic LoadBalancer IP allocation: the auto-created Service draws its IP from whatever LB allocator the cluster admin has wired up (MetalLB / Cilium LB-IPAM / robotlb / Service.spec.externalIPs). Cozystack ships MetalLB installed but does not auto-render any IPAddressPool / L2Advertisement / BGPAdvertisement / CiliumLoadBalancerIPPool. There is no gatewayIP field on the Tenant CR — to pin an address, the operator pre-creates the Service with loadBalancerIP set or hands the tenant a reference to a named admin-managed pool;
  • a migration away from Ingress for every cozystack-native exposed service (dashboard, keycloak, grafana, alerta via HTTPRoute; kubernetes-api, vm-exportproxy, cdi-uploadproxy via TLSRoute passthrough; harbor and bucket attached to the owning tenant's Gateway);
  • a per-tenant cert-manager Issuer (own ACME account) plus per-listener Certificates (HTTP-01) or a single wildcard Certificate extended with per-child-apex SANs and a *.<child-apex> listener per inheriting tenant (DNS-01). Four DNS providers supported (cloudflare, route53, digitalocean, rfc2136);
  • a five-layer security model grouped by who it defends against: Layer 3 gates tenant-user input on Tenant.spec.host via cozystack-api's admission chain; Layers 1, 2, 4, 5 are defense-in-depth via VAPs and the listener label selector (tenants do not hold gateway.networking.k8s.io/* RBAC by design);
  • foreign-takeover guards on six reconcile paths (Gateway, redirect HTTPRoute, per-tenant Issuer, wildcard Certificate, per-listener Certificate, plus the namespace-label patching path which only writes / strips namespace.cozystack.io/gateway on namespaces it annotates with cozystack.io/gateway-attached-by).

Sections

  • Overview — opt-in defaults, inheritance, coexistence with ingress-nginx.
  • Architecture — reconciliation flow, traffic path, listener layout (HTTP-80 / HTTPS / TLS-passthrough), allowedRoutes selector mechanics per listener role.
  • Enabling Gateway API — platform-level Package example with full attachedNamespaces list, per-tenant Tenant example for owning and inheriting tenants, custom-apex case.
  • Inheritance — how the namespace label is set, how cross-namespace ParentRef is gated, how DNS-01 mode extends the parent's wildcard cert.
  • Cert mode — HTTP-01 (default, per-listener) vs DNS-01 (opt-in, wildcard + per-child SANs), provider matrix, listener-cap considerations.
  • Per-service routing — HTTPRoute / TLSRoute tables: service → namespace → route name → backend → listener.
  • Security — mermaid diagram and one section per layer, plus HostnameConflict resolution and the foreign-takeover guards.
  • Certificates — per-tenant Issuer, supported ACME servers, Let's Encrypt rate limits and mitigations.
  • Migration from ingress-nginx — step-by-step for new and existing clusters.
  • Known limitations — multi-tenant shared LB IP (deferred until Cilium ListenerSet, cilium#42756), TLSRoute v1alpha2, DNS-01 multi-apex DNS-provider requirement, supported ACME issuers, upstream application gaps.
  • Troubleshooting — concrete kubectl recipes for the most likely stuck states (TenantGateway ReconcileError, Gateway not Programmed, Certificate not Ready, HostnameConflict, admission denials, inheriting child route not accepted, <pending> LB IP).

Target branch

next/ — the version-agnostic trunk. When the page ships with the upstream release that contains cozystack/cozystack#2470, it materialises automatically under the corresponding vX.Y/ via make release-next.

Not included

The legacy v1/networking/gateway-api.md page on the abandoned docs/gateway-api branch (from the Envoy Gateway proposal in cozystack/cozystack#2213) is unrelated to this PR. That proposal targeted a different architecture that has since been superseded.

Release note

NONE

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 23, 2026

Deploy Preview for cozystack ready!

Name Link
🔨 Latest commit 475a505
🔍 Latest deploy log https://app.netlify.com/projects/cozystack/deploys/6a158c145653100008f978f3
😎 Deploy Preview https://deploy-preview-509--cozystack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds documentation and platform configuration for an opt-in Cilium-backed Gateway API: Helm-rendered per-tenant Gateway resources, controller materialization of Gateways/Issuers/Certificates, HTTPRoute/TLSRoute handling (redirects, ACME, passthrough), LoadBalancer IP pool usage, admission policies, migration notes, and troubleshooting.

Changes

Cohort / File(s) Summary
Gateway API documentation
content/en/docs/next/networking/gateway-api.md
New doc describing Cozystack’s opt-in Gateway API with Cilium: per-tenant TenantGateway rendering, controller materialization of Gateways/Issuers/Certificates, HTTPRoute/TLSRoute listener patterns (HTTP→HTTPS redirect, HTTP-01 vs DNS-01, optional TLS passthrough), cert issuance modes (prod/stage), LB IP allocation via CiliumLoadBalancerIPPool, security/validation policies, coexistence and migration guidance, and troubleshooting steps.
Platform package docs & schema
content/en/docs/next/operations/configuration/platform-package.md
Adds publishing.exposure documentation and publishing.certificates.dns01.* fields for DNS-01 providers; introduces gateway.enabled and gateway.attachedNamespaces platform values (with defaults) and documents how publishing modes map to Service types and validation/fail-fast behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Admin as Platform Helm/Values
    participant K8s as Kubernetes API
    participant Controller as cozystack-controller
    participant TenantNS as Tenant Namespace
    participant CertManager
    participant Envoy as Envoy DaemonSet
    participant LBPool as CiliumLoadBalancerIPPool

    Admin->>K8s: enable platform Gateway (gateway.enabled, attachedNamespaces)
    Admin->>K8s: install GatewayClass, ValidatingAdmissionPolicies
    TenantNS->>K8s: tenant with spec.gateway: true
    K8s->>Controller: render TenantGateway CRs
    Controller->>K8s: materialize Gateway, Issuer, Certificate, HTTPRoute/TLSRoute
    K8s->>CertManager: ACME certificate request (HTTP-01 or DNS-01)
    CertManager-->>K8s: certificate issued
    Controller->>LBPool: allocate tenant LoadBalancer IP
    Envoy->>K8s: program listeners (HTTPS, redirects, optional passthrough)
    Client->>Envoy: TLS or HTTP request
    Envoy->>TenantNS: route to backend per HTTPRoute/TLSRoute
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 I hopped through Gateways, tidy and bright,

Tenants got doors, certs shining in light,
Envoy listens, LB IPs in a row,
Admission checks guard where requests may go,
I left a carrot-shaped doc to show.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: comprehensive documentation of Cilium-backed Gateway API covering architecture, security model, and migration strategy.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/gateway-api-cilium

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive documentation for the Gateway API support in Cozystack, detailing its architecture, security model, and migration path from ingress-nginx. The review feedback identifies opportunities to improve technical accuracy and consistency, specifically by clarifying that namespace whitelisting applies to both HTTPRoute and TLSRoute resources and resolving a naming inconsistency for the Kubernetes API route.

- The exposed-service templates (dashboard, keycloak) stop rendering their `Ingress` and start rendering their `HTTPRoute`.
- TLS-passthrough services (cozystack-api, vm-exportproxy, cdi-uploadproxy) stop rendering their `Ingress` and start rendering a `TLSRoute` attached to a dedicated Passthrough listener.

The `attachedNamespaces` list restricts which namespaces may attach `HTTPRoute`s to tenant Gateways through the listener `allowedRoutes` whitelist (see [Security](#security)). It is also guarded by a runtime `ValidatingAdmissionPolicy` that rejects any `tenant-*` entry.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The documentation mentions that attachedNamespaces restricts HTTPRoute attachments. However, the architecture also utilizes TLSRoute for services like the Kubernetes API and KubeVirt proxies (as shown in the routing tables). It would be more accurate to state that this list applies to both HTTPRoute and TLSRoute (or Gateway API routes in general).

Suggested change
The `attachedNamespaces` list restricts which namespaces may attach `HTTPRoute`s to tenant Gateways through the listener `allowedRoutes` whitelist (see [Security](#security)). It is also guarded by a runtime `ValidatingAdmissionPolicy` that rejects any `tenant-*` entry.
The attachedNamespaces list restricts which namespaces may attach HTTPRoutes and TLSRoutes to tenant Gateways through the listener allowedRoutes whitelist (see [Security](#security)). It is also guarded by a runtime ValidatingAdmissionPolicy that rejects any tenant-* entry.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1ea0093: the paragraph now says HTTPRoute or TLSRoute. The allowedRoutes whitelist on the listener is route-kind-agnostic, so in practice it restricts every route type that attaches to the Gateway — including the TLSRoutes used for the Kubernetes API, vm-exportproxy, and cdi-uploadproxy.


| Service | Namespace | `TLSRoute` name | Backend | Listener |
|---|---|---|---|---|
| Kubernetes API | `default` | `kubernetes-api` | `kubernetes:443` | `tls-api` |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is an inconsistency in naming the Kubernetes API route. It is referred to as cozystack-api in the Mermaid diagram (line 27) and the migration section (line 265), but as kubernetes-api in this table. Using a consistent name throughout the document would improve clarity.

Suggested change
| Kubernetes API | `default` | `kubernetes-api` | `kubernetes:443` | `tls-api` |
| Kubernetes API | default | cozystack-api | kubernetes:443 | tls-api |

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in dcb805c by aligning the Mermaid diagram to the real resource name. The TLSRoute is literally named kubernetes-api (see packages/system/cozystack-api/templates/api-tlsroute.yaml), so the table at line 144 is the source of truth. cozystack-api in the diagram referred to the cozystack package that ships this route, which was misleading. The diagram now says kubernetes-api and the migration prose clarifies the relationship (cozystack-api (Kubernetes API)).


Every listener on a tenant Gateway pins `allowedRoutes.namespaces.from: Selector` to a `matchExpressions` whitelist against the built-in `kubernetes.io/metadata.name` label. That label is written by kube-apiserver on every namespace and cannot be spoofed.

The whitelist is the publishing tenant's namespace (always, implicit) plus `publishing.gateway.attachedNamespaces`. A namespace outside the list literally cannot attach any `HTTPRoute` to the Gateway.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the comment on line 101, this section should clarify that the whitelist applies to both HTTPRoute and TLSRoute, as both are used in the described architecture.

Suggested change
The whitelist is the publishing tenant's namespace (always, implicit) plus `publishing.gateway.attachedNamespaces`. A namespace outside the list literally cannot attach any `HTTPRoute` to the Gateway.
The whitelist is the publishing tenant's namespace (always, implicit) plus publishing.gateway.attachedNamespaces. A namespace outside the list literally cannot attach any HTTPRoute or TLSRoute to the Gateway.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e15d865: the Layer 1 description now explicitly says HTTPRoute or TLSRoute. Same root cause as the line 101 comment — the listener-level whitelist applies to every route kind attaching to that listener.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@content/en/docs/next/networking/gateway-api.md`:
- Line 56: The in-page anchor "#tls-passthrough" in the sentence "Plus one extra
listener per TLS-passthrough service (see [TLS passthrough](`#tls-passthrough`)
below)" doesn't match the actual heading ID; locate the "TLS passthrough"
section heading in this document and either rename that heading (or add an
explicit HTML anchor/id) to produce the ID tls-passthrough, or update the link
fragment to the existing heading ID (for example whatever the generated slug
is); ensure the link target and the heading ID for the TLS passthrough section
are identical so the anchor works.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1a003edc-54d7-4122-a90c-e40d9592e1c7

📥 Commits

Reviewing files that changed from the base of the PR and between 5415111 and 2a68b49.

📒 Files selected for processing (1)
  • content/en/docs/next/networking/gateway-api.md

Comment thread content/en/docs/next/networking/gateway-api.md Outdated
Copy link
Copy Markdown
Contributor

@myasnikovdaniil myasnikovdaniil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No explanation of publishing.exposure flag in platform package, needs to be added

@lexfrei
Copy link
Copy Markdown
Contributor Author

Aleksei Sviridkin (lexfrei) commented Apr 27, 2026

myasnikovdaniil Added a publishing.exposure subsection to the Migration section in 927ca5f.

It covers what the flag does (ingress-nginx Service shape: ClusterIP+externalIPs vs LoadBalancer), why a Gateway API rollout is the natural moment to flip it (so ingress-nginx and the per-tenant Gateway draw from the same Cilium-managed pool), the KEP-5707 deprecation timeline that forces the move before Kubernetes v1.40, and the loadBalancer-mode caveats lifted from the platform values.yaml: non-empty publishing.externalIPs, externalTrafficPolicy: Local, no built-in Cilium announcement, brief ingress interruption when switching, and the scope limit to ingress-nginx (vpn and similar still need separate migration).

@myasnikovdaniil
Copy link
Copy Markdown
Contributor

Platform parameters must also land into platform package reference

@lexfrei
Copy link
Copy Markdown
Contributor Author

myasnikovdaniil Done — added the platform parameters to the Platform Package Reference in 6888f84:

  • publishing.exposure row in the Publishing table.
  • New Gateway section between Authentication and Scheduling with gateway.enabled and gateway.attachedNamespaces, including the default whitelist verbatim.

Schema was verified against packages/core/platform/values.yaml on chore/gateway-api-crds-v1.5.1 (the parent PR is not yet merged). Side note: that verification surfaced one inconsistency in the Gateway API page itself — Layer 1 of the Security section called the whitelist publishing.gateway.attachedNamespaces, while the actual key is the root-level gateway.attachedNamespaces (consumed at packages/core/platform/templates/apps.yaml as .Values.gateway.attachedNamespaces). Fixed in 7db740b. The publishing.gateway path also appears in the upstream PR description and packages/extra/gateway/README.md; might be worth a one-line nit upstream.

The branch was rebased onto main to pick up the latest telemetry fixes, so SHAs from earlier comments shifted (10b1e7c1927ca5f).

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
content/en/docs/next/operations/configuration/platform-package.md (2)

108-108: Minor: Consider consistent spelling variant.

Line 108 uses "Materialising" (British English). While both variants are correct, using consistent spelling throughout the documentation improves polish. Consider "Materializing" if the project prefers American English, or keep the current form if British English is the standard.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@content/en/docs/next/operations/configuration/platform-package.md` at line
108, The documentation uses the British English spelling "Materialising" in the
description for `gateway.enabled`; update that word to the project's chosen
variant (e.g., change "Materialising" to "Materializing") for consistency with
the rest of the docs—edit the text in the `gateway.enabled` description to the
preferred spelling.

66-66: Consider breaking up the dense table description for better scannability.

The publishing.exposure description packs mode definitions, deprecation timeline, validation behavior, and a caveat link into a single paragraph. Users scanning the table may miss the critical deprecation warning or the fail-fast validation note.

♻️ Suggested restructure for improved readability
-| `publishing.exposure` | `"externalIPs"` | Mode for the ingress-nginx Service. `externalIPs` creates a `ClusterIP` Service with `Service.spec.externalIPs` populated from `publishing.externalIPs`. `loadBalancer` creates a `type: LoadBalancer` Service backed by a `CiliumLoadBalancerIPPool` populated with the same addresses. `Service.spec.externalIPs` is deprecated upstream in Kubernetes v1.36 ([KEP-5707][kep-5707]) — switch to `loadBalancer` before upgrading past v1.40. The chart fails fast if `loadBalancer` is set with an empty `publishing.externalIPs`. See [Gateway API → ingress-nginx Service mode]({{% ref "/docs/next/networking/gateway-api#publishingexposure--ingress-nginx-service-mode" %}}) for the full caveat list. |
+| `publishing.exposure` | `"externalIPs"` | Mode for the ingress-nginx Service.<br/><br/>`externalIPs`: Creates a `ClusterIP` Service with `Service.spec.externalIPs` populated from `publishing.externalIPs`.<br/><br/>`loadBalancer`: Creates a `type: LoadBalancer` Service backed by a `CiliumLoadBalancerIPPool` using the same addresses.<br/><br/>**Deprecation notice:** `Service.spec.externalIPs` is deprecated in Kubernetes v1.36 ([KEP-5707][kep-5707]). Switch to `loadBalancer` before upgrading to v1.40.<br/><br/>**Validation:** The chart returns an error if `loadBalancer` is set with an empty `publishing.externalIPs`.<br/><br/>See [Gateway API → ingress-nginx Service mode]({{% ref "/docs/next/networking/gateway-api#publishingexposure--ingress-nginx-service-mode" %}}) for additional caveats. |

This uses <br/> tags (permitted by unsafe: true Goldmark config) to create visual breaks within the table cell, making each concept easier to locate.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@content/en/docs/next/operations/configuration/platform-package.md` at line
66, The table cell for publishing.exposure is too dense; split its single
paragraph into separate sentences or lines (using permitted <br/> tags) that
each cover: the two modes and what they do (externalIPs vs loadBalancer and that
loadBalancer uses CiliumLoadBalancerIPPool), the deprecation of
Service.spec.externalIPs (KEP-5707) with the upgrade advice to switch before
v1.40, the validation/fail-fast behavior when loadBalancer is set but
publishing.externalIPs is empty, and the link to Gateway API → ingress-nginx
Service mode, so readers can scan and find the deprecation and validation notes
quickly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@content/en/docs/next/operations/configuration/platform-package.md`:
- Line 108: The documentation uses the British English spelling "Materialising"
in the description for `gateway.enabled`; update that word to the project's
chosen variant (e.g., change "Materialising" to "Materializing") for consistency
with the rest of the docs—edit the text in the `gateway.enabled` description to
the preferred spelling.
- Line 66: The table cell for publishing.exposure is too dense; split its single
paragraph into separate sentences or lines (using permitted <br/> tags) that
each cover: the two modes and what they do (externalIPs vs loadBalancer and that
loadBalancer uses CiliumLoadBalancerIPPool), the deprecation of
Service.spec.externalIPs (KEP-5707) with the upgrade advice to switch before
v1.40, the validation/fail-fast behavior when loadBalancer is set but
publishing.externalIPs is empty, and the link to Gateway API → ingress-nginx
Service mode, so readers can scan and find the deprecation and validation notes
quickly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 144efd7e-3b34-413f-890f-38c874ba5c84

📥 Commits

Reviewing files that changed from the base of the PR and between 10b1e7c and 6888f84.

📒 Files selected for processing (2)
  • content/en/docs/next/networking/gateway-api.md
  • content/en/docs/next/operations/configuration/platform-package.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • content/en/docs/next/networking/gateway-api.md

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
content/en/docs/next/operations/configuration/platform-package.md (1)

59-67: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Address markdownlint MD052 for kep-5707 by switching to an inline link in the table cell.

markdownlint-cli2 reports Missing link or image reference definition: "kep-5707" on the publishing.exposure row, even though the definition appears later in the file. This is typically a parsing limitation around table cells / reference-style links. Converting that specific mention to an inline link avoids the ambiguity and should clear the warning.

🔧 Proposed fix
-| `publishing.exposure` | `"externalIPs"` | Mode for the ingress-nginx Service. `externalIPs` creates a `ClusterIP` Service with `Service.spec.externalIPs` populated from `publishing.externalIPs`. `loadBalancer` creates a `type: LoadBalancer` Service backed by a `CiliumLoadBalancerIPPool` populated with the same addresses. `Service.spec.externalIPs` is deprecated upstream in Kubernetes v1.36 ([KEP-5707][kep-5707]) — switch to `loadBalancer` before upgrading past v1.40. The chart fails fast if `loadBalancer` is set with an empty `publishing.externalIPs`. See [Gateway API → ingress-nginx Service mode]({{% ref "/docs/next/networking/gateway-api#publishingexposure--ingress-nginx-service-mode" %}}) for the full caveat list. |
+| `publishing.exposure` | `"externalIPs"` | Mode for the ingress-nginx Service. `externalIPs` creates a `ClusterIP` Service with `Service.spec.externalIPs` populated from `publishing.externalIPs`. `loadBalancer` creates a `type: LoadBalancer` Service backed by a `CiliumLoadBalancerIPPool` populated with the same addresses. `Service.spec.externalIPs` is deprecated upstream in Kubernetes v1.36 ([KEP-5707](https://github.com/kubernetes/enhancements/issues/5707)) — switch to `loadBalancer` before upgrading past v1.40. The chart fails fast if `loadBalancer` is set with an empty `publishing.externalIPs`. See [Gateway API → ingress-nginx Service mode]({{% ref "/docs/next/networking/gateway-api#publishingexposure--ingress-nginx-service-mode" %}}) for the full caveat list. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@content/en/docs/next/operations/configuration/platform-package.md` around
lines 59 - 67, The markdownlint MD052 warning is caused by the reference-style
link `[kep-5707]` inside the `publishing.exposure` table cell; update that table
row (the `publishing.exposure` entry in the diff) to use an inline link with the
full URL for KEP-5707 instead of the reference-style link so the linter can
resolve it (leave the later definition in place or remove it if you prefer, but
the table must use the inline URL).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@content/en/docs/next/networking/gateway-api.md`:
- Around line 100-127: Update the platform Package example for
cozystack.cozystack-platform so gateway.attachedNamespaces includes the default
namespace; modify the values under
spec.components.platform.values.gateway.attachedNamespaces to add "default"
alongside the listed cozy-* namespaces (ensure you edit the snippet showing
gateway.enabled: true for cozystack.cozystack-platform and update
gateway.attachedNamespaces accordingly).

---

Outside diff comments:
In `@content/en/docs/next/operations/configuration/platform-package.md`:
- Around line 59-67: The markdownlint MD052 warning is caused by the
reference-style link `[kep-5707]` inside the `publishing.exposure` table cell;
update that table row (the `publishing.exposure` entry in the diff) to use an
inline link with the full URL for KEP-5707 instead of the reference-style link
so the linter can resolve it (leave the later definition in place or remove it
if you prefer, but the table must use the inline URL).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 25ca4fc4-0ed7-4315-bd1a-4ae23063592b

📥 Commits

Reviewing files that changed from the base of the PR and between 6888f84 and b4d413c.

📒 Files selected for processing (2)
  • content/en/docs/next/networking/gateway-api.md
  • content/en/docs/next/operations/configuration/platform-package.md

Comment thread content/en/docs/next/networking/gateway-api.md Outdated
Copy link
Copy Markdown
Member

@kvaps Andrei Kvapil (kvaps) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed alongside cozystack/cozystack#2470 — see parent review for the full context. Three asks specific to this docs PR; they all stem from the same restructure of the security framing that the PR review requests in the implementation repo.

1. Restructure the ## Security section in content/en/docs/next/networking/gateway-api.md

The current "guarded at seven independent layers" framing overstates the role of layers that aren't protecting against tenant users — under Cozystack's API surface model, tenants only write apps.cozystack.io/* resources through cozystack-api and don't hold RBAC on gateway.networking.k8s.io/*, core Namespaces, or cozystack.io/Package. So most of the seven layers are defense-in-depth (against chart bugs, controller bugs, supply-chain compromise, admin mistakes), not first-line tenant defenses.

Asks:

  • Replace the section intro with a split-by-purpose framing: tenant-user-input gates (Layer 4 + the cozystack-api admission-chain fix), defense-in-depth (Layers 1, 2, 5, 6, 7), and admin-against-themselves (Layer 3).
  • Update the mermaid diagram (lines 266–292) so the attacker arrow lands on Layer 4 / cozystack-api admission as the user-input boundary, with the other layers branching off as covering chart / controller / supply-chain failure modes rather than all converging from the same ATK node.
  • In the Layer 7 description (lines 328–334) replace a tenant user with HTTPRoute RBAC could otherwise exploit — that RBAC isn't granted in Cozystack by design. Reframe Layer 7 as defense-in-depth against an app-chart bug or supply-chain compromise emitting HTTPRoutes outside the apex.

2. Document tenant-user API surface explicitly somewhere in the page

A short paragraph (probably in ## Overview or just before ## Security) stating that tenants interact with the platform exclusively through apps.cozystack.io/* resources (Tenant, Bucket, Kubernetes, etc.) and that the security model is built around that constraint. This makes the rest of the Security section read correctly — without that anchor, readers can mistakenly assume tenants write Gateways or HTTPRoutes directly.

3. Rewrite the migration / pinning sections after the implementation PR drops tenant.spec.gatewayIP

The parent PR review asks for tenant.spec.gatewayIP and the CiliumLoadBalancerIPPool branch to be removed (they don't fit Cozystack's MetalLB-default LB stack and the node-public-IP semantics of publishing.externalIPs). When that lands:

  • ### 3. Pinning a tenant Gateway to a specific external IP (lines 177–197) needs to be removed or replaced with a shorter note that per-tenant IP pinning, when needed, is a cluster-admin-side metallb.universe.tf/loadBalancerIPs annotation, not a tenant API field.
  • The ### Gateway Service <pending> LoadBalancer IP troubleshooting entry (line 509) should be updated to point at MetalLB pool configuration as the resolution, since that's where IPs come from in default Cozystack.

Hold this third point until the implementation PR settles — the docs change is mechanical once gatewayIP is gone.


Happy to re-review once these are in.

@lexfrei
Copy link
Copy Markdown
Contributor Author

Updated in lockstep with the implementation PR cozystack/cozystack#2470 — see my reply on the parent review for the full reasoning behind the three-group security framing and the design declines on inheritance and on dropping tenant.spec.gatewayIP.

Three docs-side asks from the previous review are addressed:

1. ## Security section restructured. Opens with the three-group framing (tenant-user-input gates / defense-in-depth / admin-against-themselves), lists the seven layer descriptions under that framing for completeness. Mermaid diagram redrawn so the attacker arrow lands on Layer 4 + cozystack-api admission as the user-input boundary; defense-in-depth and admin-against-themselves layers branch off as separate sources rather than all converging from a single ATK node. Layer 7 description reworded — the "tenant user with HTTPRoute RBAC" framing is dropped; tenants in Cozystack don't hold gateway.networking.k8s.io/* RBAC by design (new "Tenant API surface" subsection in ## Overview anchors that constraint).

2. Pinning section rewritten under MetalLB. tenant.spec.gatewayIP now translates to a per-tenant IPAddressPool in cozy-metallb labeled cozystack.io/per-tenant-gateway=true; controller writes metallb.universe.tf/address-pool on the Gateway's spec.infrastructure.annotations. The section explains the canonicalisation rules for the cross-Tenant uniqueness check (bare-vs-CIDR, IPv6 alternates, whitespace, unparseable input, too-broad CIDR rejected at admission) and includes the TOCTOU caveat under strict concurrency.

3. Troubleshooting <pending> rewritten to point at MetalLB diagnostics: kubectl get ipaddresspool / l2advertisement / bgpadvertisement, plus a one-liner for finding every Tenant currently using a gatewayIP.

Plus a fourth piece: new "LB allocator prerequisites" subsection in the migration block with a worked L2Advertisement example using the cozystack.io/per-tenant-gateway=true label selector — operators can copy-paste it as the minimum config the chart expects.

@lexfrei
Copy link
Copy Markdown
Contributor Author

Mirrored the revert in cozystack/cozystack#2470 — see the parent reply for the full reasoning on why the gatewayIP design was wrong.

Net change here:

  • The ## External IP allocation (MetalLB-rendering) prose is gone — chart renders no allocator-specific manifest, allocator stays admin-configured at the platform layer.
  • The ### 3. Pinning a tenant Gateway to a specific external IP section reverted with the implementation: there is no tenant.spec.gatewayIP field anymore.
  • Troubleshooting <pending> block reverted to the pre-MetalLB-pool shape — the diagnostic surface is "did the admin's allocator actually allocate?", not "is the per-tenant pool present in cozy-metallb?".
  • The "LB allocator prerequisites" subsection in the migration block is gone for the same reason — the prerequisite is "configure your allocator", not "configure MetalLB pools the chart will reference".

What stays from the previous revision:

  • Security section in the three-group framing (tenant-user-input gates / defense-in-depth / admin-against-themselves), with the new mermaid diagram routing the attacker arrow at Layer 4 / cozystack-api admission as the user-input boundary. That part of the rewrite was not contested — it's anchored in the apps.cozystack.io/* tenant API surface, which the new "Tenant API surface" subsection in ## Overview makes explicit.
  • Layer 7 wording fix (drop the "tenant user with HTTPRoute RBAC" framing — that RBAC isn't granted in Cozystack — reframed as defense-in-depth against chart bugs / supply-chain compromise).

@lexfrei Aleksei Sviridkin (lexfrei) force-pushed the docs/gateway-api-cilium branch 3 times, most recently from 7f83582 to 1e73f98 Compare May 26, 2026 10:00
Copy link
Copy Markdown
Member

@kvaps Andrei Kvapil (kvaps) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — the three asks from the May 6 review (mirroring cozystack/cozystack#2470) are all addressed:

  1. Security section restructured. The "seven independent layers" framing is gone, replaced by the three-group split (tenant-user-input gate / defense-in-depth / admin-against-themselves), and the mermaid now separates the USER source (Tenant spec.host → L3) from the CHART source (app-chart bug / supply-chain → L1, L2, L4, L5) instead of converging everything from one attacker node. Layer 5 (cozystack-route-hostname-policy, formerly Layer 7) is reframed as defense-in-depth against a chart bug / supply-chain compromise, and the "tenant with HTTPRoute RBAC" wording is gone.

  2. Tenant API surface documented. The new ### Tenant API surface paragraph in Overview (and the Security intro) state explicitly that tenants write only apps.cozystack.io/* through cozystack-api and hold no RBAC on gateway.networking.k8s.io/* / Namespaces / Package — which anchors the rest of the Security section correctly.

  3. gatewayIP / IP-pinning reworked. No gatewayIP field anywhere; the traffic-path section and troubleshooting now describe IP allocation as mechanism-agnostic (MetalLB / Cilium LB-IPAM / robotlb / externalIPs), with pinning done admin-side via loadBalancerIP on the Service. platform-package.md is consistent too (tenant-* entries in attachedNamespaces documented as allowed).

Nice touch: the Inheritance section correctly states there is no per-child ReferenceGrant — the label selector is the cross-namespace gate.

One cross-PR consistency note (against the implementation repo, not this page): this page renumbered the model to 5 layers, but packages/extra/gateway/README.md in cozystack/cozystack#2470 still uses the 7-layer numbering and still documents cozystack-gateway-attached-namespaces-policy (Layer 3) and the render-time tenant-* ban (Layer 6) — both removed by the inheritance rework (the VAP no longer exists in the tree). The controller comments there also reference "Layer 7" for what this page calls Layer 5. Worth syncing the README/comments down to this page's 5-layer model so the two don't drift. Flagging that on the #2470 side too.

Approving.

@lexfrei Aleksei Sviridkin (lexfrei) force-pushed the docs/gateway-api-cilium branch 3 times, most recently from b7ba196 to 0a4c992 Compare May 26, 2026 11:43
Documents the per-tenant Gateway API ingress that landed in
cozystack/cozystack#2470:

- TenantGateway CRD reconciled by cozystack-controller: chart renders
  one CR per opted-in tenant, controller materialises the Gateway,
  per-tenant Issuer, redirect HTTPRoute, and Certificate(s).
- Inheritance through the namespace.cozystack.io/gateway label: a
  tenant only owns a dedicated Gateway when it explicitly sets
  tenant.spec.gateway=true; every other tenant in the tree attaches
  through the nearest ancestor that owns one. The apps/tenant chart
  writes the label; the controller patches the same label onto
  cozy-* namespaces listed in attachedNamespaces.
- HTTP-01 (default) renders per-listener Certificates; DNS-01
  (opt-in) renders one wildcard cert per owning Gateway and extends
  it with per-child-apex SANs + one *.<child-apex> listener per
  inheriting tenant. Four DNS providers supported (cloudflare,
  route53, digitalocean, rfc2136).
- Mechanism-agnostic LoadBalancer IP allocation: the auto-created
  Service draws its IP from whatever LB allocator the cluster admin
  has configured (MetalLB / Cilium LB-IPAM / robotlb / externalIPs).
  Cozystack ships MetalLB installed but does not auto-render any
  IPAddressPool / L2Advertisement / CiliumLoadBalancerIPPool.
- Per-service routing tables: HTTPRoute for dashboard, keycloak,
  harbor, bucket; TLSRoute for kubernetes-api, vm-exportproxy,
  cdi-uploadproxy.
- Five-layer security model grouped by who it defends against:
  Layer 4 gates tenant-user input on Tenant.spec.host; Layers 1, 2,
  5, 7 are defense-in-depth against chart bugs / controller bugs /
  supply-chain compromise / admin mistakes; tenants do not hold
  gateway.networking.k8s.io/* RBAC by design. Layer 1 selector
  mechanics differ by listener role (HTTPS: inheritance label;
  HTTP-80: built-in metadata.name whitelist).
- Foreign-takeover guards on six paths (Gateway, redirect HTTPRoute,
  Issuer, wildcard Certificate, per-listener Certificate, and the
  namespace label patching via cozystack.io/gateway-attached-by
  annotation) so hand-pinned operator state is never silently
  overwritten.
- Migration story from ingress-nginx (per-cluster and per-tenant),
  troubleshooting recipes for the most likely stuck states.

Also adds the Gateway and DNS-01 provider value rows to the
platform-package reference.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
@lexfrei Aleksei Sviridkin (lexfrei) merged commit 3720cbf into main May 26, 2026
6 checks passed
@lexfrei Aleksei Sviridkin (lexfrei) deleted the docs/gateway-api-cilium branch May 26, 2026 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants