Skip to content

OCPBUGS-77344: enable dcm on route changes#778

Open
jcmoraisjr wants to merge 2 commits into
openshift:masterfrom
jcmoraisjr:OCPBUGS-77344-dcm-update-route
Open

OCPBUGS-77344: enable dcm on route changes#778
jcmoraisjr wants to merge 2 commits into
openshift:masterfrom
jcmoraisjr:OCPBUGS-77344-dcm-update-route

Conversation

@jcmoraisjr

@jcmoraisjr jcmoraisjr commented May 25, 2026

Copy link
Copy Markdown
Member

DCM was disabled on route changes due to the way the resource changes are being handled: without DCM, the template plugin applies a route remove, followed by a route add, which works well in the fork-and-reload approach since the removal is not applied until adding the route again succeed. On DCM, each of these operations are done separately as well, which means that the route is completely disabled during some time.

This update changes this behavior by saving previous and updated states of a route, identifying the differences, and dynamically applying them if possible. A fork-and-reload is made in case the dynamic update is not possible, as before.

Part of the changes also include extending the Endpoint struct so that it stores the weight and the VerifyHostname properties of each endpoint in order to make comparisons easier, as well as decrease the number of parameters sent to inner methods. These fields cannot be updated during the Endpoint creation, since they depend on the backend the endpoint is assigned. The router's updateEndpointTable() method centralizes all changes to these context based fields.

Also, reverting #739 as a separate commit since it fixes the root cause of the problem.

https://redhat.atlassian.net/browse/OCPBUGS-77344

Summary by CodeRabbit

  • New Features

    • HAProxy error-detection helpers for common server conditions.
    • Updates now report when an endpoint was replaced vs updated in-place.
  • Bug Fixes

    • Disabling health checks explicitly marks servers healthy and recovers when checks are missing.
    • Improved handling of protocol/port/weight changes to avoid misconfiguration and failures.
  • Refactor

    • Centralized endpoint table management and streamlined add/update/delete flows for less disruptive dynamic route updates.
  • Tests

    • Expanded tests covering update/replace scenarios and HTTP/2 cleartext edge cases.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 25, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@jcmoraisjr: This pull request references Jira Issue OCPBUGS-77344, which is invalid:

  • expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

DCM was disabled on route changes due to the way the resource changes are being handled: without DCM, the template plugin applies a route remove, followed by a route add, which works well in the fork-and-reload approach since the removal is not applied until adding the route again succeed. On DCM, each of these operations are done separately as well, which means that the route is completely disabled during some time.

This update changes this behavior by saving previous and updated states of a route, identifying the differences, and dynamically applying them if possible. A fork-and-reload is made in case the dynamic update is not possible, as before.

Part of the changes also include extending the Endpoint struct so that it stores the weight and the VerifyHostname properties of each endpoint in order to make comparisons easier, as well as decrease the number of parameters sent to inner methods. These fields cannot be updated during the Endpoint creation, since they depend on the backend the endpoint is assigned. The router's updateEndpointTable() method centralizes all changes to these context based fields.

https://redhat.atlassian.net/browse/OCPBUGS-77344

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label May 25, 2026
@coderabbitai

coderabbitai Bot commented May 25, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Per-endpoint Weight and VerifyHostname fields are added and persisted in a route EndpointTable. Router uses the table for dynamic add/update/replace/remove. Config-manager reconciles endpoints by ID (add→update→delete) with updated health-check enable/disable logic. Backend APIs now accept endpoint pointers, support replace flows, and expose explicit health-setting commands; tests updated accordingly.

Changes

Endpoint-Driven Server Management Refactor

Layer / File(s) Summary
Type system and ConfigManager interface updates
pkg/router/template/types.go
Endpoint gains Weight and VerifyHostname. ConfigManager.ReplaceRouteEndpoints final parameter changes from weight int32 to activeEndpoints int.
HAProxy backend: add/update/replace
pkg/router/template/configmanager/haproxy/backend.go
Add error classifiers; AddServer/DeleteServer/UpdateServer/ReplaceServer and internals now use *Endpoint; UpdateServer diffs endpoints to choose in-place update vs ReplaceServer; derive drain/maintenance from Endpoint.Weight.
Backend health and exec validation
pkg/router/template/configmanager/haproxy/backend.go
Add innerSetServerHealth and apiSetServerHealth; DisableHealthCheck disables checks then sets health up; execCommand treats apiSetServerHealth like other set ops for response validation.
Backend dynamic-update tests
pkg/router/template/configmanager/haproxy/backend_test.go
Rewrite tests to use old/new Endpoint pairs, add replace and h2c variant cases, update expected command sequences, and wire new backend signatures.
Config manager reconciliation and deletion
pkg/router/template/configmanager/haproxy/manager.go
ReplaceRouteEndpoints diffs endpoints by ID and DeepEqual, applies add→update→delete with aggregated errors, updates health-check enable/disable behavior, and passes endpoints by pointer for deletes. findMatchingBlueprint now returns nil.
Router endpoint table and dynamic route logic
pkg/router/template/router.go
Add updateEndpointTable; persist per-backend EndpointTable entries; introduce dynamicallyUpdateRoute; refactor dynamicallyReplaceEndpoints/dynamicallyRemoveEndpoints to use stored EndpointTable and updated callsites.
Blueprint test shim
pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go
Fake ReplaceRouteEndpoints signature changed to accept activeEndpoints int.

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • openshift/router#763: Related prior work on HAProxy dynamic backend add/delete and backend update/replace flows.
🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (14 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title references a Jira ticket (OCPBUGS-77344) and indicates the main change is enabling DCM (dynamic configuration management) on route changes, which directly matches the PR's core objective.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names are stable and deterministic. PR uses standard Go testing (not Ginkgo) with static test names containing no dynamic identifiers, timestamps, or generated values.
Test Structure And Quality ✅ Passed Tests use standard Go testing framework, not Ginkgo, so custom check criteria don't apply. Code follows Go test best practices with table-driven pattern and single-responsibility cases.
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. All test modifications are unit tests using Go's standard testing package. The check is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR contains only standard Go unit tests (using testing.T), not Ginkgo e2e tests. The custom check targets Ginkgo e2e tests which are not present in this PR.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only Go router code and HAProxy configuration logic; no deployment manifests, scheduling constraints, affinity rules, node selectors, or topology-aware changes are introduced.
Ote Binary Stdout Contract ✅ Passed No process-level stdout writes found. Logging uses klogr which defaults to stderr. No fmt.Print*, klog, or direct print calls in process-level code.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests were added in this PR. The modified test files (backend_test.go, blueprint_plugin_test.go) are standard Go unit tests using the testing package, not Ginkgo e2e tests.
No-Weak-Crypto ✅ Passed No weak cryptographic patterns (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB) are introduced in this PR; pre-existing MD5 usage in router.go remains unchanged.
Container-Privileges ✅ Passed No container/K8s manifest files are modified in this PR; all changes are Go source code files (backend.go, manager.go, router.go, types.go and their tests). The custom check is not applicable.
No-Sensitive-Data-In-Logs ✅ Passed No sensitive data exposed in logs. HAProxy operational responses and endpoint IDs logged are non-sensitive identifiers. No passwords, tokens, API keys, or PII are logged.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from bentito and ironcladlou May 25, 2026 18:00
@jcmoraisjr

Copy link
Copy Markdown
Member Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 25, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@jcmoraisjr: This pull request references Jira Issue OCPBUGS-77344, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@jcmoraisjr: This pull request references Jira Issue OCPBUGS-77344, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request.

Details

In response to this:

DCM was disabled on route changes due to the way the resource changes are being handled: without DCM, the template plugin applies a route remove, followed by a route add, which works well in the fork-and-reload approach since the removal is not applied until adding the route again succeed. On DCM, each of these operations are done separately as well, which means that the route is completely disabled during some time.

This update changes this behavior by saving previous and updated states of a route, identifying the differences, and dynamically applying them if possible. A fork-and-reload is made in case the dynamic update is not possible, as before.

Part of the changes also include extending the Endpoint struct so that it stores the weight and the VerifyHostname properties of each endpoint in order to make comparisons easier, as well as decrease the number of parameters sent to inner methods. These fields cannot be updated during the Endpoint creation, since they depend on the backend the endpoint is assigned. The router's updateEndpointTable() method centralizes all changes to these context based fields.

https://redhat.atlassian.net/browse/OCPBUGS-77344

Summary by CodeRabbit

Release Notes

  • New Features

  • Added error detection helpers for HAProxy server management operations.

  • Introduced improved server endpoint replacement logic with better fallback handling.

  • Bug Fixes

  • Enhanced health check state management with explicit server health updates.

  • Improved dynamic endpoint updates with more robust comparison and fallback behavior.

  • Refactor

  • Refined server addition, update, and deletion workflows for greater reliability.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jcmoraisjr

Copy link
Copy Markdown
Member Author

/assign @gcs278

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pkg/router/template/router.go (2)

886-896: 💤 Low value

Consider using oldConfig.EndpointTable[serviceKey] for removed service endpoints.

When removing a service from a route, using service.EndpointTable relies on the invariant that the ServiceUnit's endpoint table matches what was previously synced to this backend. Using oldConfig.EndpointTable[serviceKey] would be more explicit about removing exactly what was added to this specific backend.

While the state should normally be consistent due to locking, using the old config's endpoint table would be more defensive.

♻️ Suggested change
 	for serviceKey := range oldConfig.ServiceUnits {
-		if service, ok := r.findMatchingServiceUnit(serviceKey); ok {
-			if _, newFound := newConfig.ServiceUnits[serviceKey]; !newFound {
-				// Service was removed, remove the endpoints.
-				if err := r.dynamicConfigManager.RemoveRouteEndpoints(backendKey, service.EndpointTable); err != nil {
+		if _, newFound := newConfig.ServiceUnits[serviceKey]; !newFound {
+			if service, ok := r.findMatchingServiceUnit(serviceKey); ok {
+				// Service was removed from route, remove the endpoints that were in the old config.
+				if err := r.dynamicConfigManager.RemoveRouteEndpoints(backendKey, oldConfig.EndpointTable[serviceKey]); err != nil {
 					log.Info("router will reload due to error removing endpoints", "backendKey", backendKey, "serviceKey", serviceKey, "error", err)
 					failed = true
 				}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/router/template/router.go` around lines 886 - 896, The code currently
passes service.EndpointTable when removing endpoints for services present in
oldConfig.ServiceUnits but no longer in newConfig; instead, use the exact
endpoint table that was part of the previous route config to be defensive.
Update the removal call in the loop that iterates oldConfig.ServiceUnits (around
findMatchingServiceUnit and the RemoveRouteEndpoints call) to use
oldConfig.EndpointTable[serviceKey] (or the appropriate map/key in oldConfig)
rather than service.EndpointTable, preserving backendKey and serviceKey when
calling r.dynamicConfigManager.RemoveRouteEndpoints.

957-972: 💤 Low value

Misleading weight comparison in log message.

The log at line 966 compares oldWeight (from cfg.ServiceUnitNames, the calculated per-endpoint weight) with newWeight (from cfg.ServiceUnits, the route-level service weight). These values have different meanings and scales, making the log confusing for debugging.

The actual replacement logic is correct since it compares the full endpoint structs including their Weight fields.

♻️ Suggested change for clearer logging
-			log.V(4).Info("updating weight for sibling service due to endpoint change", "service", otherServiceKey, "backendKey", backendKey, "oldWeight", oldWeight, "newWeight", newWeight)
+			log.V(4).Info("updating sibling service endpoints due to weight redistribution", "service", otherServiceKey, "backendKey", backendKey)

Alternatively, compare the actual endpoint weights:

oldEpWeight := int32(0)
newEpWeight := int32(0)
if len(oldOtherEndpoints) > 0 {
    oldEpWeight = oldOtherEndpoints[0].Weight
}
if len(newOtherEndpoints) > 0 {
    newEpWeight = newOtherEndpoints[0].Weight
}
log.V(4).Info("updating sibling service endpoints", "service", otherServiceKey, "backendKey", backendKey, "oldEpWeight", oldEpWeight, "newEpWeight", newEpWeight)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/router/template/router.go` around lines 957 - 972, The log message in the
loop that updates sibling services is misleading because it compares oldWeight
(from oldWeights/cfg.ServiceUnitNames per-endpoint weight) to newWeight
(cfg.ServiceUnits route-level weight); change the log to report endpoint-level
weights instead: read the first endpoint Weight from oldOtherEndpoints and
newOtherEndpoints (guarding for empty slices) and log those as oldEpWeight and
newEpWeight in the log.V(4).Info call inside the loop that references
otherServiceKey, backendKey, oldOtherEndpoints, newOtherEndpoints and the
ReplaceRouteEndpoints error path so the message reflects actual endpoint weight
changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/router/template/router.go`:
- Around line 886-896: The code currently passes service.EndpointTable when
removing endpoints for services present in oldConfig.ServiceUnits but no longer
in newConfig; instead, use the exact endpoint table that was part of the
previous route config to be defensive. Update the removal call in the loop that
iterates oldConfig.ServiceUnits (around findMatchingServiceUnit and the
RemoveRouteEndpoints call) to use oldConfig.EndpointTable[serviceKey] (or the
appropriate map/key in oldConfig) rather than service.EndpointTable, preserving
backendKey and serviceKey when calling
r.dynamicConfigManager.RemoveRouteEndpoints.
- Around line 957-972: The log message in the loop that updates sibling services
is misleading because it compares oldWeight (from
oldWeights/cfg.ServiceUnitNames per-endpoint weight) to newWeight
(cfg.ServiceUnits route-level weight); change the log to report endpoint-level
weights instead: read the first endpoint Weight from oldOtherEndpoints and
newOtherEndpoints (guarding for empty slices) and log those as oldEpWeight and
newEpWeight in the log.V(4).Info call inside the loop that references
otherServiceKey, backendKey, oldOtherEndpoints, newOtherEndpoints and the
ReplaceRouteEndpoints error path so the message reflects actual endpoint weight
changes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1a3caf78-e491-44da-b467-71d20f596926

📥 Commits

Reviewing files that changed from the base of the PR and between 6761134 and a5ac103.

📒 Files selected for processing (6)
  • pkg/router/template/configmanager/haproxy/backend.go
  • pkg/router/template/configmanager/haproxy/backend_test.go
  • pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go
  • pkg/router/template/configmanager/haproxy/manager.go
  • pkg/router/template/router.go
  • pkg/router/template/types.go

@jcmoraisjr jcmoraisjr force-pushed the OCPBUGS-77344-dcm-update-route branch from a5ac103 to 5dbcbdf Compare May 26, 2026 19:44
@openshift-ci

openshift-ci Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from gcs278. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/router/template/router.go (1)

982-996: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove endpoints per route, not from the full ServiceUnit table.

This helper now sends service.EndpointTable to every associated backend. If two routes select different target ports on the same service, Line 995 will attempt to delete endpoints that were never configured for some of those backends, which turns a normal endpoint deletion into a forced reload.

Proposed fix
-func (r *templateRouter) dynamicallyRemoveEndpoints(service ServiceUnit) bool {
+func (r *templateRouter) dynamicallyRemoveEndpoints(id ServiceUnitKey, service ServiceUnit) bool {
 	if r.dynamicConfigManager == nil || !r.synced {
 		return false
 	}
@@
 	for backendKey := range service.ServiceAliasAssociations {
-		if _, ok := r.state[backendKey]; !ok {
+		cfg, ok := r.state[backendKey]
+		if !ok {
 			continue
 		}
@@
-		if err := r.dynamicConfigManager.RemoveRouteEndpoints(backendKey, service.EndpointTable); err != nil {
+		if err := r.dynamicConfigManager.RemoveRouteEndpoints(backendKey, cfg.EndpointTable[id]); err != nil {
 			// Error dynamically modifying the config, so return false to cause a reload to happen.
 			log.Info("router will reload as the ConfigManager could not dynamically remove endpoints for backend", "backendKey", backendKey, "error", err)
 			return false
 		}
 	}

You'll also need the matching call-site update at Line 1014.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/router/template/router.go` around lines 982 - 996, The
dynamicallyRemoveEndpoints helper is passing the entire
ServiceUnit.EndpointTable to every associated backend, causing
RemoveRouteEndpoints(backendKey, service.EndpointTable) to delete endpoints that
don't belong to specific backends; change dynamicallyRemoveEndpoints (and its
call-site) to pass only the route-specific endpoints for each backend (i.e., the
endpoints that correspond to that backendKey) instead of service.EndpointTable,
updating the use of ServiceAliasAssociations and the call-site that invokes
dynamicallyRemoveEndpoints so each backend removal receives the correct
per-route endpoint slice/map.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/router/template/configmanager/haproxy/backend.go`:
- Around line 370-375: When innerDeleteServer(oldEP) fails after you already
called innerSetServerState(oldEP, false), restore the original server state to
avoid leaving the fallback endpoint drained; change the code around the
innerDeleteServer call to capture the delete error, call
b.innerSetServerState(oldEP, true) to revert the change, and return a combined
or wrapped error if the restore also fails (reference innerSetServerState and
innerDeleteServer to locate the logic).
- Around line 361-363: The code only updates HAProxy weight when oldEP.Weight !=
newEP.Weight via b.innerUpdateServerWeight, but fails to toggle the server's
state between "ready" and "drain" when the weight crosses the zero boundary (<=0
vs >0), causing drained endpoints to still serve or recovered ones to remain
drained (and passthrough always sends 100%). Modify the branch handling in the
function that compares oldEP and newEP so that when weight changes you detect a
crossing of the drain threshold (oldEP.Weight > 0 && newEP.Weight <= 0 =>
transition to "drain"; oldEP.Weight <= 0 && newEP.Weight > 0 => transition to
"ready") and invoke the appropriate HAProxy state-change routine in addition to
calling b.innerUpdateServerWeight(newEP, isPassthrough); ensure isPassthrough
still results in correct weight payload but do the state flip via the same
server-state API used elsewhere.

In `@pkg/router/template/configmanager/haproxy/manager.go`:
- Around line 524-545: The code must only attempt to re-enable the singleton
health check if that exact old endpoint still exists; before calling
backend.EnableHealthCheck on oldEndpoints[0] (and before calling
backend.ReplaceServer/adding to newEPs), verify that the endpoint is still
present in the current service/backend endpoints (e.g. check svc.Endpoints or
the current backend server list) and skip this whole branch if the endpoint no
longer exists so a prior ReplaceServer/delete isn't turned into a forced reload.

In `@pkg/router/template/router.go`:
- Around line 886-890: When removing a service from a route, the code currently
passes the raw service list (service.EndpointTable) to RemoveRouteEndpoints;
instead it should pass the route-scoped endpoint slice that was actually
attached to this backend (the EndpointTable stored on the service unit from the
old route config). Update the call in the loop over oldConfig.ServiceUnits so
that, after locating the removed service via r.findMatchingServiceUnit, you
retrieve the service unit from oldConfig.ServiceUnits[serviceKey] (or the local
variable representing that unit) and pass that unit's EndpointTable (the
PreferPort-filtered slice) to
r.dynamicConfigManager.RemoveRouteEndpoints(backendKey, ... ) instead of
service.EndpointTable, keeping the same backendKey and error handling.

---

Outside diff comments:
In `@pkg/router/template/router.go`:
- Around line 982-996: The dynamicallyRemoveEndpoints helper is passing the
entire ServiceUnit.EndpointTable to every associated backend, causing
RemoveRouteEndpoints(backendKey, service.EndpointTable) to delete endpoints that
don't belong to specific backends; change dynamicallyRemoveEndpoints (and its
call-site) to pass only the route-specific endpoints for each backend (i.e., the
endpoints that correspond to that backendKey) instead of service.EndpointTable,
updating the use of ServiceAliasAssociations and the call-site that invokes
dynamicallyRemoveEndpoints so each backend removal receives the correct
per-route endpoint slice/map.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: aa126d80-4b4b-44bb-841c-1121e049641b

📥 Commits

Reviewing files that changed from the base of the PR and between a5ac103 and 5dbcbdf.

📒 Files selected for processing (6)
  • pkg/router/template/configmanager/haproxy/backend.go
  • pkg/router/template/configmanager/haproxy/backend_test.go
  • pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go
  • pkg/router/template/configmanager/haproxy/manager.go
  • pkg/router/template/router.go
  • pkg/router/template/types.go

Comment thread pkg/router/template/configmanager/haproxy/backend.go
Comment thread pkg/router/template/configmanager/haproxy/backend.go
Comment thread pkg/router/template/configmanager/haproxy/manager.go
Comment thread pkg/router/template/router.go Outdated
@Miciah

Miciah commented May 27, 2026

Copy link
Copy Markdown
Contributor

It would be helpful cross-reference #739 in the PR description and commit 0e47c83 in the commit message—or even have a separate revert commit if it reverts cleanly.

@gcs278

gcs278 commented May 27, 2026

Copy link
Copy Markdown
Contributor

/assign

@jcmoraisjr jcmoraisjr force-pushed the OCPBUGS-77344-dcm-update-route branch from 5dbcbdf to b6c9a42 Compare May 27, 2026 18:18
@jcmoraisjr

Copy link
Copy Markdown
Member Author

or even have a separate revert commit if it reverts cleanly.

Added a separate commit reverting 0e47c83, the new implementation is on top of it.

@jcmoraisjr jcmoraisjr force-pushed the OCPBUGS-77344-dcm-update-route branch from b6c9a42 to 09d388f Compare May 28, 2026 13:20

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
pkg/router/template/router.go (1)

982-996: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the route-scoped endpoint slice when deleting service endpoints.

service.EndpointTable is still the raw service list here. For a PreferPort route, Line 995 can try to delete servers that were never attached to that backend and force an avoidable reload. The stored per-route snapshot in r.state[backendKey].EndpointTable[...] is the right source, just like the earlier route-removal fix.

Suggested fix
-func (r *templateRouter) dynamicallyRemoveEndpoints(service ServiceUnit) bool {
+func (r *templateRouter) dynamicallyRemoveEndpoints(id ServiceUnitKey, service ServiceUnit) bool {
 	if r.dynamicConfigManager == nil || !r.synced {
 		return false
 	}
@@
 	for backendKey := range service.ServiceAliasAssociations {
-		if _, ok := r.state[backendKey]; !ok {
+		cfg, ok := r.state[backendKey]
+		if !ok {
 			continue
 		}
 
 		log.V(4).Info("dynamically removing endpoints for associated backend", "backendKey", backendKey)
-		if err := r.dynamicConfigManager.RemoveRouteEndpoints(backendKey, service.EndpointTable); err != nil {
+		if err := r.dynamicConfigManager.RemoveRouteEndpoints(backendKey, cfg.EndpointTable[id]); err != nil {
 			// Error dynamically modifying the config, so return false to cause a reload to happen.
 			log.Info("router will reload as the ConfigManager could not dynamically remove endpoints for backend", "backendKey", backendKey, "error", err)
 			return false
@@
-	configChanged := r.dynamicallyRemoveEndpoints(service)
+	configChanged := r.dynamicallyRemoveEndpoints(id, service)

Also applies to: 1014-1014

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/router/template/router.go` around lines 982 - 996, The code is calling
RemoveRouteEndpoints with the raw service.EndpointTable which can differ from
the per-route snapshot for a given backend (causing wrong deletions for
PreferPort routes); in dynamicallyRemoveEndpoints replace service.EndpointTable
with the stored route-scoped endpoints from
r.state[backendKey].EndpointTable[...] (use the entry keyed by the service/route
identifier used in r.state, e.g.
r.state[backendKey].EndpointTable[service.Name]) when invoking
r.dynamicConfigManager.RemoveRouteEndpoints; make the same change at the other
occurrence around the later block (near line 1014) so deletions use the
route-specific snapshot rather than the raw service list.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/router/template/router.go`:
- Around line 855-899: The dynamicallyUpdateRoute function must bail out when
dynamic config management is unavailable or before the initial sync to avoid
panics; add the same guard used by other dynamically* helpers at the start of
dynamicallyUpdateRoute by checking r.dynamicConfigManager for nil and the
manager's "synced/initialized" state (e.g., r.dynamicConfigManager == nil ||
!r.dynamicConfigManager.IsSynced()/IsInitialized() or whatever method your
DynamicConfigManager exposes) and return false if not ready so
ReplaceRouteEndpoints and RemoveRouteEndpoints are never called when the DCM is
disabled or not yet synced.

---

Duplicate comments:
In `@pkg/router/template/router.go`:
- Around line 982-996: The code is calling RemoveRouteEndpoints with the raw
service.EndpointTable which can differ from the per-route snapshot for a given
backend (causing wrong deletions for PreferPort routes); in
dynamicallyRemoveEndpoints replace service.EndpointTable with the stored
route-scoped endpoints from r.state[backendKey].EndpointTable[...] (use the
entry keyed by the service/route identifier used in r.state, e.g.
r.state[backendKey].EndpointTable[service.Name]) when invoking
r.dynamicConfigManager.RemoveRouteEndpoints; make the same change at the other
occurrence around the later block (near line 1014) so deletions use the
route-specific snapshot rather than the raw service list.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1ef3f03f-4a2b-450e-b3bb-4c12ef9191ab

📥 Commits

Reviewing files that changed from the base of the PR and between b6c9a42 and 09d388f.

📒 Files selected for processing (6)
  • pkg/router/template/configmanager/haproxy/backend.go
  • pkg/router/template/configmanager/haproxy/backend_test.go
  • pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go
  • pkg/router/template/configmanager/haproxy/manager.go
  • pkg/router/template/router.go
  • pkg/router/template/types.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go
  • pkg/router/template/types.go
  • pkg/router/template/configmanager/haproxy/backend_test.go
  • pkg/router/template/configmanager/haproxy/manager.go
  • pkg/router/template/configmanager/haproxy/backend.go

Comment thread pkg/router/template/router.go
@jcmoraisjr jcmoraisjr force-pushed the OCPBUGS-77344-dcm-update-route branch from 09d388f to 5052289 Compare May 28, 2026 13:55

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/router/template/configmanager/haproxy/manager.go`:
- Around line 460-461: The code currently skips updates when only the
health-check policy flips because reflect.DeepEqual detects changes but the
UpdateServer flow (UpdateServer and the later health pass) doesn’t apply
health-only changes; modify the update flow so health-only differences in
modifiedEndpoints (epPair entries in modifiedEndpoints map) are handled: when
iterating modifiedEndpoints (where modifiedEndpoints[newEP.ID] = epPair{...}),
detect if the only diff is health-related and call the same logic that updates
health checks (the UpdateServer/replace health-check path or the health pass
that walks added/replaced endpoints) for that endpoint ID so HAProxy gets
reconfigured; ensure you reference the epPair fields (oldEP/newEP), the
modifiedEndpoints map, and the UpdateServer health-check update code path so
health-only changes trigger the appropriate reconfigure instead of being treated
as a no-op.
- Around line 766-770: The unconditional "return nil" disables dynamic route
adds; change it so we only skip using blueprints when the HAProxy 2.8 bug
actually applies and a matching blueprint exists. In practice, restore the
normal flow in AddRoute by removing the unconditional return and replace it with
a conditional that checks the HAProxy version and the result of
findMatchingBlueprint(...) (e.g., only return nil when haproxy version is 2.8
and findMatchingBlueprint(...) returns a non-nil blueprint); otherwise proceed
so the rest of the matcher runs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d0103cac-5801-43d7-be47-9c6bd41df0ae

📥 Commits

Reviewing files that changed from the base of the PR and between 09d388f and 5052289.

📒 Files selected for processing (6)
  • pkg/router/template/configmanager/haproxy/backend.go
  • pkg/router/template/configmanager/haproxy/backend_test.go
  • pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go
  • pkg/router/template/configmanager/haproxy/manager.go
  • pkg/router/template/router.go
  • pkg/router/template/types.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • pkg/router/template/types.go
  • pkg/router/template/configmanager/haproxy/blueprint_plugin_test.go
  • pkg/router/template/configmanager/haproxy/backend_test.go
  • pkg/router/template/configmanager/haproxy/backend.go
  • pkg/router/template/router.go

Comment thread pkg/router/template/configmanager/haproxy/manager.go
Comment thread pkg/router/template/configmanager/haproxy/manager.go
DCM was disabled on route changes due to the way the resource changes
are being handled: without DCM, the template plugin applies a route
remove, followed by a route add, which works well in the fork-and-reload
approach since the removal is not applied until adding the route again
succeed. On DCM, each of these operations are done separately as well,
which means that the route is completely disabled during some time.

This update changes this behavior by saving previous and updated states
of a route, identifying the differences, and dynamically applying them
if possible. A fork-and-reload is made in case the dynamic update is
not possible, as before.

Part of the changes also include extending the Endpoint struct so that
it stores the weight and the VerifyHostname properties of each endpoint
in order to make comparisons easier, as well as decrease the number of
parameters sent to inner methods. These fields cannot be updated during
the Endpoint creation, since they depend on the backend the endpoint is
assigned. The router's updateEndpointTable() method centralizes all
changes to these context based fields.

https://redhat.atlassian.net/browse/OCPBUGS-77344
@jcmoraisjr jcmoraisjr force-pushed the OCPBUGS-77344-dcm-update-route branch from 5052289 to b69f565 Compare May 28, 2026 16:26
@openshift-ci

openshift-ci Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

@jcmoraisjr: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants