Make applyClusterTLSProfile() non-fatal to prevent crash-loops in HyperShift hosted clusters by redhat-chai-bot · Pull Request #3859 · operator-framework/operator-lifecycle-manager

redhat-chai-bot · 2026-06-29T02:23:51Z

Problem

applyClusterTLSProfile() in packageserver startup (pkg/package-server/server/server.go) fetches the cluster APIServer CR with a 30-second hard timeout. If the API server isn't reachable — which happens during HyperShift hosted cluster creation when the API server may not be ready yet — the error is fatal: the entire container crashes and restarts in a loop.

This is a probabilistic race condition. In OpenShift CI, this caused 5 consecutive payload rejections over 36 hours in the 5.0 CI stream, blocking hypershift-e2e-aws (3 consecutive failures) and hypershift-e2e-aks (2 consecutive failures).

Fix

Make the error non-fatal: Log a warning and continue with defaults instead of returning a fatal error. The Package Server Manager (PSM) will inject the correct TLS flags (--tls-min-version, --tls-cipher-suites) on its next reconciliation anyway, so crashing here is unnecessary.
Reduce timeout from 30s to 10s: Best-effort lookup that doesn't block startup.

Changes

pkg/package-server/server/server.go | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

Validation

All 5 existing TLS-related tests pass
Downstream proof-of-concept (openshift/operator-framework-olm#1333) was validated with /payload-job — the primary failing HyperShift e2e job (e2e-aws-ovn) passed after a 3-payload failure streak
Tracking: TRT-2761

…erShift hosted clusters applyClusterTLSProfile() fetches the cluster APIServer CR on startup with a hard timeout. In HyperShift hosted clusters the API server may not be ready when packageserver starts, causing the fatal error path to crash-loop the container indefinitely. Make the error non-fatal: log a warning and continue with defaults instead of returning a fatal error. The Package Server Manager (PSM) will inject the correct TLS flags on the next reconciliation anyway, so crashing is unnecessary. Also reduce the lookup timeout from 30s to 10s so we don't stall startup when the API is genuinely unreachable. Validated in downstream openshift/operator-framework-olm PR operator-framework#1333, where the payload job e2e-aws-ovn passed with this fix. Downstream tracking: https://redhat.atlassian.net/browse/TRT-2761 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

openshift-ci · 2026-06-29T02:24:03Z

Hi @redhat-chai-bot. Thanks for your PR.

I'm waiting for a operator-framework member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci · 2026-06-29T02:24:09Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kevinrizza for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copilot

Pull request overview

This PR prevents package-server crash-loops during startup by making the “best-effort” OpenShift APIServer TLS profile lookup non-fatal when the API server/config API isn’t reachable yet (notably during HyperShift hosted cluster bring-up), and by shortening the lookup timeout to reduce startup blocking.

Changes:

Convert applyClusterTLSProfile() failure during startup from a hard error to a warning, continuing with defaults.
Reduce the APIServer TLS profile lookup timeout from 30s to 10s.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

 		if err := applyClusterTLSProfile(ctx, clientConfig, o.SecureServing); err != nil {
-			return fmt.Errorf("failed to apply cluster TLS profile to serving options: %w", err)
+			log.WithError(err).Warn("Failed to apply cluster TLS profile to serving options, continuing with defaults. " +
+				"PSM will inject the correct TLS flags on next reconciliation.")


tmshort · 2026-06-29T13:55:39Z

Closing in favor of #3860

Copilot AI review requested due to automatic review settings June 29, 2026 02:23

openshift-ci Bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 29, 2026

openshift-ci Bot requested review from joelanford and rashmigottipati June 29, 2026 02:24

Copilot started reviewing on behalf of redhat-chai-bot June 29, 2026 02:24 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

tmshort closed this Jun 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make applyClusterTLSProfile() non-fatal to prevent crash-loops in HyperShift hosted clusters#3859

Make applyClusterTLSProfile() non-fatal to prevent crash-loops in HyperShift hosted clusters#3859
redhat-chai-bot wants to merge 1 commit into
operator-framework:masterfrom
redhat-chai-bot:fix-packageserver-tls-crash-hypershift

redhat-chai-bot commented Jun 29, 2026

Uh oh!

openshift-ci Bot commented Jun 29, 2026

Uh oh!

openshift-ci Bot commented Jun 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

tmshort commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

redhat-chai-bot commented Jun 29, 2026

Problem

Fix

Changes

Validation

Uh oh!

openshift-ci Bot commented Jun 29, 2026

Uh oh!

openshift-ci Bot commented Jun 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

tmshort commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants