[codex] Apply generated cpflow GitHub Actions flow by justin808 · Pull Request #732 · shakacode/react-webpack-rails-tutorial

justin808 · 2026-04-30T08:39:05Z

Summary

Replace the older hand-written Control Plane GitHub Actions with the generated cpflow-* action and workflow set from cpflow generate-github-actions.
Keep review apps opt-in, staging deploys branch-gated, production promotion manual, and nightly review-app cleanup in the generated flow.
Update Control Plane docs links so they point at the new cpflow-* action/workflow names.

Verification

BUNDLE_GEMFILE=/tmp/cpflow-pr-278/Gemfile bin/conductor-exec bundle exec ruby /tmp/cpflow-pr-278/bin/cpflow github-flow-readiness — no blocking readiness issues detected.
bin/conductor-exec ruby -e 'require "yaml"; paths = Dir[".github/**/*.yml"] + Dir[".github/**/*.yaml"] + Dir[".controlplane/**/*.yml"]; paths.sort.each { |path| YAML.load_file(path, aliases: true); puts path }'
bin/conductor-exec git diff --cached --check
bin/conductor-exec bundle exec rubocop

Not Run / Blocked

bin/conductor-exec bundle exec rspec could not complete locally because PostgreSQL was not running at /tmp/.s.PGSQL.5432; after the first Rails load failure, Coveralls setup also reported repeated coverage initialization errors.

Note

Medium Risk
Replaces and expands CI/CD automation for staging/review apps and production promotion, so misconfiguration could affect deployments or delete the wrong resources despite added guards.

Overview
Switches Control Plane automation from the bespoke GitHub Actions to the generated cpflow-* composite actions and workflows, updating docs accordingly.

Adds a full generated flow for opt-in review apps (slash-command /deploy-review-app + auto-redeploy after creation), review-app deletion (on PR close or /delete-review-app), staging deploys gated to the staging branch, manual promote-to-production with env-var name parity checks + health polling + best-effort rollback, and a nightly stale review-app cleanup.

Hardens automation around secrets and safety: centralized config validation, safer Control Plane token handling, optional SSH-based Docker builds, and stricter review-app deletion prefix checks to prevent accidental staging/production deletes.

^{Reviewed by Cursor Bugbot for commit 3331332. Bugbot is set up for automated code reviews on this repo. Configure here.}

coderabbitai · 2026-04-30T08:39:13Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 565a7e06-a2ec-40a5-b4dc-c82729f95dda

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/cpflow-github-actions-flow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Upstream token not passed to copy-image command
- Added the required -t upstream token argument to the cpflow copy-image-from-upstream command while keeping the token sourced from secrets via the environment.

Or push these changes by commenting:

@cursor push 3606e16506

Preview (3606e16506)

diff --git a/.github/workflows/cpflow-promote-staging-to-production.yml b/.github/workflows/cpflow-promote-staging-to-production.yml
--- a/.github/workflows/cpflow-promote-staging-to-production.yml
+++ b/.github/workflows/cpflow-promote-staging-to-production.yml
@@ -203,14 +203,13 @@
 
       - name: Copy image from staging
         env:
-          # Pass the upstream token via env rather than `-t` so it doesn't appear in /proc/<pid>/cmdline.
           CPLN_UPSTREAM_TOKEN: ${{ secrets.CPLN_TOKEN_STAGING }}
           PRODUCTION_APP_NAME: ${{ vars.PRODUCTION_APP_NAME }}
           CPLN_ORG_PRODUCTION: ${{ vars.CPLN_ORG_PRODUCTION }}
         shell: bash
         run: |
           set -euo pipefail
-          cpflow copy-image-from-upstream -a "${PRODUCTION_APP_NAME}" --org "${CPLN_ORG_PRODUCTION}"
+          cpflow copy-image-from-upstream -a "${PRODUCTION_APP_NAME}" -t "${CPLN_UPSTREAM_TOKEN}" --org "${CPLN_ORG_PRODUCTION}"
 
       - name: Deploy image to production
         env:

_{You can send follow-ups to the cloud agent here.}

claude · 2026-04-30T09:03:41Z

Code Review: cpflow GitHub Actions Migration

This is a solid migration from hand-written to generated cpflow-* workflows. The overall architecture is a clear improvement. Here's a summary of what stands out.

Security — meaningful fixes

Token exposure fixed: The old setup-environment action passed the token as --token "$TOKEN" (visible in /proc/<pid>/cmdline and ps). The new action correctly uses CPLN_TOKEN env var. Same for CPLN_UPSTREAM_TOKEN in the promote workflow.
Critical: /deploy-review-app now requires author_association. The old workflow had no author_association check, meaning any GitHub user who could comment on a PR could trigger a deployment. The new check ("OWNER","MEMBER","COLLABORATOR") correctly gates this.
Exact command match (== '/deploy-review-app') replaces contains(). Prevents accidental triggers from comments like "I ran /deploy-review-app earlier and it worked".
SSH key isolation: The cpflow-build-docker-image action scopes DOCKER_BUILD_SSH_KEY to a dedicated step so it never appears in the build step's environment — even when ACTIONS_STEP_DEBUG=true is set. The EXIT trap cleans up the key file and kills the agent. Well designed.
Fork PR handling: Fork PRs correctly skip auto-deploy (which can't access secrets) while still allowing approved collaborators to deploy via /deploy-review-app. The cpflow-review-app-help.yml is explicit about why pull_request_target is used there (no code checkout = safe for write access from forks).

Bugs fixed

cancel-in-progress: false on all deploy workflows. The old staging workflow used cancel-in-progress: true, which could interrupt a cpflow deploy-image mid-rollout and leave the workload partially updated. Now fixed across all three deployment workflows.
Delete on PR close now uses pull_request_target. The old delete-review-app.yml used pull_request: types: [closed], which runs in a restricted context without secret access when a fork PR closes. The new workflow correctly uses pull_request_target for the close event.
--run-release-phase is now conditional. The old staging deploy hardcoded it unconditionally; the new workflows detect release_script: presence at runtime before deciding.
Delete safety uses prefix allowlist, not blocklist. The old delete-app.sh used grep -iqE '(production|staging)' to block "dangerous" names — fragile against future app naming. The new delete-app.sh requires the app name to start with REVIEW_APP_PREFIX, which is a stronger guarantee.

Potential issues

cpflow exists in the deploy workflow doesn't distinguish auth errors from "not found" (see inline comment). The delete-app.sh carefully pattern-matches error output to surface API/auth failures, but the deploy workflow's check-app step treats all non-zero exits as exists=false. An intermittent auth error on a push event would silently skip the deploy.
staging deploy job has no needs: validate-branch: The deploy job correctly declares needs: [validate-branch, build] and has an if: guard, but if validate-branch produces no output (e.g., first run where the variable step fails), the if: could evaluate against an empty string. In practice the validation step exits non-zero, so build never runs and deploy is skipped by the needs: dependency. This is fine but the if: guard is slightly redundant with the needs: dependency on validate-branch.
Help command has no contents: read permission: cpflow-help-command.yml declares only issues: write and pull-requests: write. There's no explicit contents: read but GitHub's default GITHUB_TOKEN grants read on public repos, so this works — just worth noting that it relies on the repo's default permissions.

Nits

cpflow-deploy-staging.yml uses GITHUB_REF_NAME (simpler), whereas the old workflow used ${GITHUB_REF#refs/heads/} (strip-prefix approach that would silently behave oddly for tag refs). Small but cleaner.
The shakacode-team.md link fix (-info org URL corrected to shakacode-open-source-examples-production) is a separate correctness fix bundled in cleanly.

Overall this is well-structured with meaningful security and reliability improvements. The one item I'd prioritize is the cpflow exists error handling consistency noted in the inline comment.

claude · 2026-04-30T09:03:54Z

+          if cpflow exists -a "${APP_NAME}" --org "${CPLN_ORG}"; then
+            echo "exists=true" >> "$GITHUB_OUTPUT"
+          else
+            echo "exists=false" >> "$GITHUB_OUTPUT"


This if/else around cpflow exists conflates two different failure modes: "app does not exist" (expected non-zero) and "API/auth error" (unexpected non-zero). Both land in the else branch and set exists=false.

Compare delete-app.sh in this same PR, which carefully pattern-matches the output ("Double check your org", "Unknown API token format", "ERROR", etc.) to distinguish a real failure from a genuine not-found and surface it as a hard error.

If there's a transient auth issue when a PR is pushed, this step sets exists=false and the pull_request path silently writes a step-summary saying the app doesn't exist yet — masking the real problem until the deploy is manually retried.

Suggested approach: mirror delete-app.sh's pattern — capture the output, check exit code, inspect the text for known error tokens, and exit non-zero on a real failure rather than treating it as "not found".

Fixed in b9095c6 by capturing cpflow exists output and failing hard on auth/API error tokens instead of treating every non-zero exit as app-not-found.

claude · 2026-04-30T09:04:01Z

+
+        if [[ -f "${HOME}/.ssh/cpflow_build_key" ]]; then
+          eval "$(ssh-agent -s)"
+          trap 'ssh-agent -k >/dev/null; rm -f "${HOME}/.ssh/cpflow_build_key"' EXIT


The EXIT trap kills the SSH agent and removes cpflow_build_key, but the ~/.ssh/known_hosts file written in the previous step is not cleaned up here. On ephemeral GitHub-hosted runners this is harmless (the VM is discarded), but if this action is ever adopted on self-hosted runners with a persistent ~/.ssh, the custom known_hosts entries would accumulate across runs.

Consider adding rm -f "${HOME}/.ssh/known_hosts" to the trap, or using a temp file under $RUNNER_TEMP instead of ~/.ssh for the known_hosts.

claude · 2026-04-30T09:04:03Z

+    default: "3.3.1"
+  cpflow_version:
+    description: cpflow gem version
+    required: false


The CLI version is pinned here as a default, which is good for reproducibility. Worth noting: since sudo npm install -g fetches from the npm registry at runtime, a compromised release of @controlplane/cli@3.3.1 would silently affect all runs. Consider adding a SHA/integrity check (e.g. via npm install --integrity) or at minimum locking to a digest-pinned download, especially since this step runs in workflows that access staging/production tokens.

claude · 2026-04-30T09:04:06Z

+
+          current_image="${selected_image:-${first_image}}"
+          current_version="${selected_version:-${first_version}}"
+


The staging env-var parity check only compares names, not values. It will catch a missing DATABASE_URL in production, but it won't catch a case where production has the right key names but wrong or stale values (e.g. a staging secret was rotated but the production counterpart wasn't updated). This is expected behaviour for a name-only check — just worth documenting explicitly so it's clear what the check does and doesn't guarantee.

Documented in b9095c6: the production promotion workflow checks that production has the same environment variable names as staging, but it does not compare secret values.

claude · 2026-04-30T09:25:32Z

Code Review: Replace hand-written CI/CD Actions with generated cpflow- flow*

Well-considered overhaul that hardens security, safety, and operational reliability. No blocking issues found.

Security improvements

Token no longer passed via --token flag — old setup-environment exposed it in /proc/cmdline and ps; now uses CPLN_TOKEN env var only. SSH key scoped to a dedicated step so ACTIONS_STEP_DEBUG cannot leak it from the build step. Delete command now gated on author_association (old workflow had no permission check). Fork PR protection blocks syncs before any secret is consumed. Old contains() command matching replaced with exact match.

Safety improvements

cancel-in-progress changed from true to false everywhere — old staging deploy risked partial GVC deployments mid-rollout. Prefix-based delete guard replaces the old substring check for production/staging names. Production rollback logic added with per-workload state capture and restore — old workflow had no rollback at all. Health checks now configurable (HEALTH_CHECK_RETRIES, HEALTH_CHECK_INTERVAL, HEALTH_CHECK_ACCEPTED_STATUSES).

Issue 1: Fork-PR /deploy-review-app comment silently fails with no PR feedback

When a collaborator comments /deploy-review-app on a fork PR, the job condition passes but Validate review app deployment source exits 1 with only a runner log message. No PR comment explains the rejection. Suggest posting an issues.createComment before exit 1, similar to how the unconfigured-secrets path writes to GITHUB_STEP_SUMMARY.

Issue 2: cpflow exists error detection is string-matching fragile

delete-app.sh uses case matching over stderr to distinguish app-not-found from auth/network errors. The inline TODO acknowledges this. If cpflow changes an error message, the pattern will not match and a real failure will be silently treated as app not found, causing delete to return success vacuously. The TODO to switch to a structured exit code is the right long-term fix.

Issue 3: gem install / npm install without integrity pinning

cpflow-setup-environment/action.yml installs @controlplane/cli@3.3.1 and cpflow 4.2.0 without hash verification. A re-tagged or compromised release would install silently. For production-adjacent CI, consider verifying SHA256 after install or using lockfiles with integrity entries.

Issue 4: Nightly cleanup has no failure notification

cpflow-cleanup-stale-review-apps.yml runs on cron with no failure notification. Silent failures accumulate stale apps over days. Consider a step gated on if: failure() that posts a GitHub issue or annotation.

Minor observations

APP_NAME / PR_NUMBER double-assignment is intentional (workflow-level env needed early for concurrency.group, step overwrites with resolved values) but a short comment would clarify intent. cpflow-review-app-help.yml using pull_request_target is correct and well-documented. deploy job in cpflow-deploy-staging.yml re-runs cpflow-setup-environment unnecessarily (only needs cpflow deploy-image) — minor redundancy, acceptable for simplicity.

claude · 2026-04-30T09:25:43Z

+echo "🗑️ Deleting application: $APP_NAME"
+cpflow delete -a "$APP_NAME" --org "$CPLN_ORG" --yes
+
+echo "✅ Successfully deleted application: $APP_NAME"


The case pattern matching over stderr is the right approach given cpflow exists doesn't expose a structured signal, and the inline TODO is the right way to track this. One additional defensive measure worth considering: log the full exists_output to the runner even when the "not found" path is taken, so there's an audit trail if unexpected output silently passed through the filter. Right now the output is printed only if non-empty, but it's easy to miss in logs.

Also worth noting: the pattern list doesn't include "timeout", "connection refused", or "ENOTFOUND" — all plausible network failure messages. If cpflow's Ruby HTTP client times out, it may surface Errno::ECONNREFUSED or Net::ReadTimeout, which would match "Net::", so that's covered. But curl-style timeouts would not. Low risk given this is a Ruby gem, but worth keeping in mind when bumping the cpflow version.

claude · 2026-04-30T09:26:03Z

+        shell: bash
+        run: |
+          set -euo pipefail
+          cpflow cleanup-stale-apps -a "${{ vars.REVIEW_APP_PREFIX }}" --org "${{ vars.CPLN_ORG_STAGING }}" --yes


No failure notification exists for this cron job. If cpflow cleanup-stale-apps fails (auth error, API hiccup, etc.), stale review apps accumulate silently across days.

Consider appending a step:

- name: Notify on failure if: failure() run: | echo "::error::Stale review app cleanup failed. Check the workflow logs."

Or, if the team monitors GitHub issues, use gh issue create to open a tracking issue on failure so it's not lost in workflow history.

claude · 2026-04-30T09:26:22Z

+            } >> "$GITHUB_STEP_SUMMARY"
+            exit 0
+          fi
+


When a collaborator comments /deploy-review-app on a fork PR, execution reaches this exit 1 with only a runner log message — no feedback is posted to the PR thread. The collaborator sees a red workflow run with no explanation in the PR.

Consider posting a comment before exiting. The GH_TOKEN is available from the Resolve PR ref and commit step but not inherited here; you'd need to add GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} to this step's env block. Something like:

gh api "repos/${GITHUB_REPOSITORY}/issues/${PR_NUMBER}/comments" \ --method POST \ --field body="Fork PRs cannot use /deploy-review-app because this workflow uses repository secrets. Ask a maintainer to deploy from a branch in the base repository." \ --silent || true

Fixed in 1e46bfe: fork PR /deploy-review-app rejections now post a PR comment before exiting, while still keeping the deployment blocked because the workflow uses repository secrets.

claude · 2026-04-30T09:26:25Z

+      run: |
+        set -euo pipefail
+
+        sudo npm install -g @controlplane/cli@${{ inputs.cpln_cli_version }}


Version-pinned but not hash-verified. A malicious actor who gains write access to the @controlplane/cli npm package (or re-tags a release) could substitute a compromised binary. Same applies to gem install cpflow below.

For production-adjacent pipelines, consider adding a post-install checksum check, or at minimum document the expected SHA256 for each pinned version in a comment so drift is detectable during future version bumps.

claude · 2026-04-30T09:26:29Z

+        run: |
+          # Best-effort rollback: try every workload, aggregate failures, exit non-zero at the end
+          # if any failed. A single cpln hiccup shouldn't leave other workloads mid-promotion.
+          set -uo pipefail


The intentional omission of -e here (using set -uo pipefail instead of set -euo pipefail) is the correct design — it lets rollback continue across all workloads even if one cpln call fails, with rollback_failures tracking the aggregate. Worth a brief comment explaining the omission so a future reader doesn't add -e thinking it was forgotten:

Suggested change

set -uo pipefail

# Intentionally no -e: we want every workload to attempt rollback even if earlier ones fail.

set -uo pipefail

Fixed in 1e46bfe with a comment explaining that set -e is intentionally omitted so rollback can attempt every workload and aggregate failures.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Token not persisted for subsequent workflow steps
- The composite action now writes CPLN_TOKEN to GITHUB_ENV so subsequent cpflow and cpln workflow steps inherit authentication without passing the token on the command line.

Or push these changes by commenting:

@cursor push f1eee564c7

Preview (f1eee564c7)

diff --git a/.github/actions/cpflow-setup-environment/action.yml b/.github/actions/cpflow-setup-environment/action.yml
--- a/.github/actions/cpflow-setup-environment/action.yml
+++ b/.github/actions/cpflow-setup-environment/action.yml
@@ -63,6 +63,12 @@
           exit 1
         fi
 
+        {
+          echo "CPLN_TOKEN<<__CPLN_TOKEN__"
+          echo "$CPLN_TOKEN"
+          echo "__CPLN_TOKEN__"
+        } >> "$GITHUB_ENV"
+
         create_output=""
         if ! create_output="$(cpln profile create default --org "$ORG" 2>&1)"; then
           if ! echo "$create_output" | grep -qi "already exists"; then

_{You can send follow-ups to the cloud agent here.}

claude · 2026-04-30T10:39:40Z

Code Review — PR #732: Generated cpflow GitHub Actions Flow

Overview

This is a well-structured replacement of hand-written CI/CD workflows with the generated cpflow-* composite actions and workflows. The new flows are meaningfully safer and more correct than the old ones across several dimensions. The PR is in draft but the code quality is high enough to call out specifics.

Significant Improvements ✅

Secret handling: Old code passed CPLN_TOKEN via --token (visible in /proc/<pid>/cmdline and ps output). New code passes it exclusively via environment — good security hygiene across all three cpln/cpflow call sites.
Prefix guard on deletion: delete-app.sh now validates that the app name starts with $REVIEW_APP_PREFIX- before deleting, which is a much stronger guard than the old "reject names containing 'production' or 'staging'" heuristic. The old approach would have blocked deletion of any app with those words in the name (too broad) while leaving other non-review apps unprotected (too narrow).
Fork PR protection: The deploy workflow explicitly detects fork PRs and blocks them from accessing repository secrets — the old workflow had no such guard.
Author association gating: Slash commands now check author_association is OWNER, MEMBER, or COLLABORATOR. The old delete workflow had no collaborator check at all.
Non-cancelling concurrency (cancel-in-progress: false): Prevents partial deployments and half-deleted apps — this was cancel-in-progress: true in the old staging workflow.
Multi-workload rollback: The promote workflow rolls back all workloads independently on failure, aggregating errors rather than stopping at the first failure.
Env-var parity check: Pre-promotion check that production has all staging env var names prevents silent nil variable errors after promotion.
Health checks with configurable accepted statuses: The default 200 301 302 is well-reasoned for auth-gated Rails apps; HEALTH_CHECK_ACCEPTED_STATUSES makes it overridable without editing the file.
Randomized heredoc delimiter in capture-current: EOF_$(openssl rand -hex 8) correctly prevents heredoc injection if rollback state ever contained a literal EOF line.

Issues

Bug: `deploy` job may run even when `build` fails in `cpflow-deploy-staging.yml`

In GitHub Actions, an explicit if: on a job replaces the implicit success() guard. The deploy job condition is:

if: needs.validate-branch.outputs.is_deployable == 'true'

Because this doesn't check needs.build.result, the deploy job will run even if the build job fails (as long as the branch is deployable). This burns runner minutes and produces a confusing deploy attempt against a missing image.

Fix: add && needs.build.result == 'success' to the condition.

Risk: String-matching on `cpflow exists` error output is fragile

delete-app.sh (and the inline version in cpflow-deploy-review-app.yml) detect "real" errors by checking for specific strings like "Double check your org", "ERROR", "Traceback", "Net::". The file already has a # TODO comment acknowledging this, but the consequence of missing a new error pattern is silently treating a real failure as "app not found" — which could skip deletion without any alert.

This is acceptable given the stated cpflow limitation, but the heuristic string list should be kept in sync with cpflow's actual output. An alternative interim approach: treat any non-zero exit with non-empty stderr as a failure unless it matches a known "not found" pattern, rather than enumerating known error patterns.

Minor: Workflow-level `APP_NAME` env var is redundant and overridden in `cpflow-deploy-review-app.yml`

The top-level env: block in the deploy review app workflow sets APP_NAME from the raw event payload, but the Resolve PR ref and commit step later overwrites it in $GITHUB_ENV with the normalized PR number. The step-level value wins, making the workflow-level declaration dead configuration. It's also the only workflow in this set that has this pattern — the others don't duplicate APP_NAME this way.

Nitpicks

cpflow-setup-environment/action.yml: gem install cpflow -v ... without --no-document is slower than needed on CI runners.
cpflow-cleanup-stale-review-apps.yml: The Validate required secrets and variables step runs as a plain shell: bash step but has no id: — this is fine, but if you ever want to condition a downstream step on it, you'll need to add one.
The old nightly-remove-stale-review-apps.yml hardcoded the app name (qa-react-webpack-rails-tutorial) rather than using ${{ vars.REVIEW_APP_PREFIX }}. The new workflow correctly uses the variable — worth calling out as a subtle bug fix.

Summary

The core logic is solid and the security posture is materially improved. The one real bug (deploy running after build failure) should be fixed before merge. The string-matching fragility is an acknowledged limitation with a TODO already in place.

claude · 2026-04-30T10:39:46Z

+          fi
+          deploy_args+=(--org "${CPLN_ORG}" --verbose)
+
+          cpflow deploy-image "${deploy_args[@]}"


Bug: deploy runs even when build fails.

When a job has an explicit if:, GitHub Actions drops the implicit success() guard. Because this condition only checks validate-branch's output, the deploy job will start even if the build job failed, wasting runner minutes and attempting to deploy a missing image.

Suggested change

cpflow deploy-image "${deploy_args[@]}"

if: needs.validate-branch.outputs.is_deployable == 'true' && needs.build.result == 'success'

claude · 2026-04-30T10:39:54Z

+echo "🗑️ Deleting application: $APP_NAME"
+cpflow delete -a "$APP_NAME" --org "$CPLN_ORG" --yes
+
+echo "✅ Successfully deleted application: $APP_NAME"


The string-matching heuristic on cpflow exists error output (lines 43–54) is the main reliability risk in this file. The consequence of an unmatched error pattern is silently exiting 0 ("app not found") when the API actually returned an error, meaning the delete is skipped without any visible failure.

An alternative that inverts the logic: treat any non-zero exit with non-empty output as a failure unless it matches a known "not found" pattern:

# instead of matching known error strings, match known "not found" patterns case "$exists_output" in *"not found"*|*"does not exist"*) echo "⚠️ Application does not exist: $APP_NAME" exit 0 ;; *) echo "❌ ERROR: failed to determine whether application exists" >&2 printf '%s\n' "$exists_output" >&2 exit 1 ;; esac

This is safer by default — unknown output is treated as an error rather than a no-op. The tradeoff is that you'd need to know the exact "not found" wording from cpflow exists, which has the same coupling problem in the other direction. The TODO comment already captures this; just flagging it for awareness.

claude · 2026-04-30T10:39:57Z

+              echo "ready=false" >> "$GITHUB_OUTPUT"
+              {
+                echo "Control Plane review app automation is not configured yet."
+                echo


This top-level APP_NAME is computed from the raw event payload and is later overwritten in $GITHUB_ENV by the Resolve PR ref and commit step (echo "APP_NAME=..." >> "$GITHUB_ENV"). The step-level write wins, so this declaration is dead configuration and can be confusing. Consider removing it and relying solely on the step-computed value, which is already the authoritative one (it handles all three event types correctly).

claude · 2026-04-30T18:49:41Z

Code Review: cpflow Generated GitHub Actions Flow

This is a meaningful security and reliability upgrade over the handwritten workflows. Several issues from the old setup are fixed here; I've also flagged a few new items worth addressing.

✅ Notable Improvements

Token passed via env, not --token flag — prevents leakage in /proc/<pid>/cmdline and ps output (old setup-environment used cpln profile update default --org "$ORG" --token "$TOKEN")
Author association gating on /deploy-review-app and /delete-review-app — the old delete workflow accepted the command from any commenter
Fork PR protection in the deploy workflow — old workflow ran on all pull_request events without a fork check, allowing fork PRs to attempt using repo secrets
Prefix guard in delete-app.sh — prevents accidental deletion of non-review-app resources
SSH key isolated to a dedicated step — keeps it out of the main build step's env where ACTIONS_STEP_DEBUG=true would expose it
Randomized heredoc delimiters in the promotion workflow — prevents content injection if rollback state contains EOF
Old staging workflow fixed — the old deploy-to-control-plane-staging.yml triggered on branches: ['*'] (every push), burning runners on every feature branch; new workflow correctly filters to main/master
cancel-in-progress: false on all deployment concurrency groups — prevents partial rollout state from mid-cancelled deploys

🔴 Issues

1. cpflow-delete-review-app.yml uses pull_request_target without an explanation comment

The delete workflow triggers on pull_request_target: [closed]. This event runs with write permissions in the base repo context and can post PR comments on fork PRs. It's the correct choice here (since no PR code is checked out), but it's a well-known footgun — future maintainers may switch it to pull_request thinking it's "safer" and break fork PR cleanup. cpflow-review-app-help.yml has a good comment explaining this; the delete workflow should have one too.

2. cpflow exists error-pattern matching is duplicated

The same fragile string-matching logic ("Double check your org", "ERROR", "Traceback", "Net::", etc.) appears both in delete-app.sh and inline in the Check if review app exists step of cpflow-deploy-review-app.yml. The delete-app.sh has a TODO explaining the fragility; the workflow copy does not. If cpflow changes its error output, both need updating in sync. Consider either extracting this into a shared shell script, or at minimum adding the same TODO comment in the workflow.

🟡 Warnings

3. CPLN_TOKEN is written to GITHUB_ENV and persists to all subsequent steps

In cpflow-setup-environment/action.yml (lines 67–72), the token is written to GITHUB_ENV so later workflow steps can authenticate. This is necessary for cpflow to work, but it means the token is present in the environment of every subsequent step — including any step that might env | sort or log environment state. Consider documenting this tradeoff in a comment.

4. Production health check accepts 301/302 by default

HEALTH_CHECK_ACCEPTED_STATUSES defaults to '200 301 302'. A redirect to a login page or a broken URL returns a redirect, so the health check passes even if the app is misconfigured post-deploy. The docs note the recommended mitigation (set a dedicated /health endpoint and override to "200"). Consider making the default stricter, or adding a warning to the step summary when a redirect is accepted.

5. Nightly cleanup has no failure notification

cpflow-cleanup-stale-review-apps.yml runs at midnight UTC. If it fails (bad token, cpflow error, etc.), there is no alert — stale apps silently accumulate cost. Consider adding a Slack/email notification on failure, or at minimum ensure the repo has GitHub workflow failure notifications enabled for scheduled runs.

ℹ️ Informational

6. /help command has no author_association check

cpflow-help-command.yml responds to /help from any commenter (no association guard). This is probably intentional — help is informational — but on a high-volume repo the bot will reply to every /help comment from any user. Noting for awareness.

7. ruby/setup-ruby@v1 and actions/github-script@v7 use floating major versions

These will pick up any non-breaking updates automatically, which is generally fine. However, actions/checkout@v4 is also a floating major — all three could be pinned to a specific SHA for a more locked-down supply chain posture if that's a concern.

claude · 2026-04-30T18:49:53Z

+name: Delete Review App
+
+on:
+  pull_request_target:


pull_request_target deserves an explanation comment here.

pull_request_target runs in the context of the base repo (write permissions, access to secrets) and is required for fork PR cleanup — pull_request can't post comments on fork PRs because the token is read-only. This is safe here because no PR code is ever checked out (the actions/checkout@v4 below fetches base-branch code only).

cpflow-review-app-help.yml has a good comment explaining this exact pattern. Without a parallel comment here, a future maintainer may "fix" this to pull_request thinking it's safer, silently breaking fork PR cleanup.

Suggested change

pull_request_target:

# pull_request_target is intentional: it has write permission to comment on PRs from

# forks, where `pull_request` would be read-only. This is safe because no PR code is

# checked out — the job only calls cpflow with base-repo credentials.

# Do not switch this to `pull_request` or add a checkout of the PR ref without re-evaluating.

pull_request_target:

claude · 2026-04-30T18:49:59Z

+        # Make the token available to later workflow steps without putting it on argv.
+        token_delimiter="CPLN_TOKEN_$(openssl rand -hex 8)"
+        {
+          echo "CPLN_TOKEN<<${token_delimiter}"
+          printf '%s\n' "$CPLN_TOKEN"
+          echo "${token_delimiter}"
+        } >> "$GITHUB_ENV"


Writing CPLN_TOKEN to GITHUB_ENV propagates it to every subsequent step in the calling workflow — not just the steps that explicitly use it. This is a necessary tradeoff (cpflow picks it up from the environment automatically), but it's worth documenting so future maintainers don't inadvertently add steps that log environment state.

Suggested change

# Make the token available to later workflow steps without putting it on argv.

token_delimiter="CPLN_TOKEN_$(openssl rand -hex 8)"

{

echo "CPLN_TOKEN<<${token_delimiter}"

printf '%s\n' "$CPLN_TOKEN"

echo "${token_delimiter}"

} >> "$GITHUB_ENV"

# Persist CPLN_TOKEN to GITHUB_ENV so all subsequent steps can authenticate

# with cpflow/cpln without putting the token on the command line.

# Trade-off: the token is present in every subsequent step's environment.

# Do not add steps that dump env vars (e.g., `env | sort`) after this point.

token_delimiter="CPLN_TOKEN_$(openssl rand -hex 8)"

{

echo "CPLN_TOKEN<<${token_delimiter}"

printf '%s\n' "$CPLN_TOKEN"

echo "${token_delimiter}"

} >> "$GITHUB_ENV"

claude · 2026-04-30T18:50:04Z

+# TODO: replace this string-matching with a structured signal once `cpflow exists` exposes one
+# (e.g. a distinct exit code for "not found" vs. API/auth errors, or `cpflow exists --json`).
+# Until then, keep this list in sync if `cpflow exists` starts emitting new error patterns —
+# any unmatched error string would otherwise be silently treated as "app not found".
+exists_output=""
+if ! exists_output="$(cpflow exists -a "$APP_NAME" --org "$CPLN_ORG" 2>&1)"; then
+  case "$exists_output" in
+    *"Double check your org"*|*"Unknown API token format"*|*"ERROR"*|*"Error:"*|*"Traceback"*|*"Net::"*)


The same cpflow exists error-pattern matching appears verbatim in the Check if review app exists step in cpflow-deploy-review-app.yml (line ~195). The TODO comment here is accurate but only covers this file — the workflow copy is uncovered.

If cpflow adds a new error pattern (e.g. "rate limit" or "timeout"), both locations need updating in sync. Consider extracting this into a shared script (e.g. .github/scripts/cpflow-app-exists.sh) that both the action and the workflow call, so the pattern list is maintained in one place.

claude · 2026-04-30T18:50:11Z

+  # repo variable to tighten this for apps that expose a dedicated health endpoint
+  # (e.g. "200" for a plain /health, or "200 401 403" for apps that auth-gate / without
+  # redirecting).
+  HEALTH_CHECK_ACCEPTED_STATUSES: ${{ vars.HEALTH_CHECK_ACCEPTED_STATUSES || '200 301 302' }}


Accepting 301/302 as "healthy" means a redirect to a login page, maintenance page, or even a misconfigured CDN passes the health check. The comment correctly explains why (curl without -L, Rails auth-gating /), but a redirect can pass even when the destination is broken.

Consider adding a step summary warning when a non-200 status is accepted, so operators reviewing the promotion log have visibility:

if [[ "${http_status}" == "${accepted}" ]]; then if [[ "${http_status}" != "200" ]]; then echo "::warning::Health check passed with HTTP ${http_status} (not 200) — verify the redirect destination is healthy." fi echo "healthy=true" >> "$GITHUB_OUTPUT" exit 0 fi

Also worth documenting in .controlplane/shakacode-team.md that setting HEALTH_CHECK_ACCEPTED_STATUSES=200 and adding a /health endpoint is the recommended production posture.

claude · 2026-04-30T18:50:17Z

+on:
+  workflow_dispatch:
+  schedule:
+    - cron: "0 0 * * *"


Scheduled workflows that fail silently are easy to miss — GitHub does send an email for failed scheduled runs but only to the repo owner by default, and only after 7 days of consecutive failures (as of current GitHub behavior).

Consider adding a failure notification step, or at minimum documenting that the team should verify GitHub's "Notify me of failed workflow runs" notification setting is enabled. Stale review apps that aren't cleaned up accrue Control Plane costs.

Suggested change

- cron: "0 0 * * *"

- cron: "0 3 * * *" # 3 AM UTC — avoids exact midnight when many repos run scheduled jobs simultaneously

(The cron timing is a minor suggestion — staggering off midnight reduces GitHub's scheduler contention, making the run more likely to start on time.)

claude · 2026-05-01T22:16:50Z

+            workload_json="$(cpln workload get "${workload_name}" --gvc "${PRODUCTION_APP_NAME}" --org "${CPLN_ORG_PRODUCTION}" -o json)"
+          # current_image/current_version are summary fields for the first container
+          # of the selected workload; rollback_state below captures all containers.
+          workload_image="$(echo "${workload_json}" | jq -r '.spec.containers[0].image')"


Indentation is inconsistent with the rest of the loop body. workload_image is at 10 spaces while workload_containers and workload_version below it are at 12. Bash ignores the difference, but it looks like a copy/paste artifact and will confuse readers into thinking this line is outside the loop.

Suggested change

workload_image="$(echo "${workload_json}" | jq -r '.spec.containers[0].image')"

workload_image="$(echo "${workload_json}" | jq -r '.spec.containers[0].image')"

claude · 2026-05-01T22:16:58Z

+          staging_vars="$(CPLN_TOKEN="${CPLN_TOKEN_STAGING}" cpln gvc get "${STAGING_APP_NAME}" --org "${CPLN_ORG_STAGING}" -o json | jq -r '.spec.env // [] | .[].name' | sort)"
+          production_vars="$(CPLN_TOKEN="${CPLN_TOKEN_PRODUCTION}" cpln gvc get "${PRODUCTION_APP_NAME}" --org "${CPLN_ORG_PRODUCTION}" -o json | jq -r '.spec.env // [] | .[].name' | sort)"


This parity check covers GVC-level env vars only. Workload-level environment variables (set directly on individual workloads rather than the GVC) and Control Plane secret references (cpln://secret/... values) are not compared. A variable added to a specific staging workload won't be caught here and could cause a silent mismatch in production after promotion.

Consider adding a note to the step summary output — e.g. echo "Note: workload-level env vars and cpln:// secret references are not compared." >> "$GITHUB_STEP_SUMMARY" — so operators are reminded of the scope on every promotion run.

claude · 2026-05-01T22:17:05Z

+        shell: bash
+        run: |
+          set -euo pipefail
+          cpflow cleanup-stale-apps -a "${REVIEW_APP_PREFIX}" --org "${CPLN_ORG_STAGING}" --yes


This runs destructively with --yes on every scheduled tick with no dry-run path. If REVIEW_APP_PREFIX is ever misconfigured (e.g., set to a prefix that matches staging-adjacent apps), apps are silently deleted. A low-cost safeguard would be to add a workflow_dispatch input for dry-run mode:

Suggested change

cpflow cleanup-stale-apps -a "${REVIEW_APP_PREFIX}" --org "${CPLN_ORG_STAGING}" --yes

cpflow cleanup-stale-apps -a "${REVIEW_APP_PREFIX}" --org "${CPLN_ORG_STAGING}" --yes

echo "Stale review apps under prefix '${REVIEW_APP_PREFIX}' have been cleaned up." >> "$GITHUB_STEP_SUMMARY"

(Or add a separate dry_run dispatch input that swaps --yes for a list-only command.)

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Fork PR slash-command rejection posts no PR comment
- Added an issue_comment-only rejection comment step for fork PR deploy requests before all deployment-gated steps are skipped.

Or push these changes by commenting:

@cursor push 85e534a5ab

Preview (85e534a5ab)

diff --git a/.github/workflows/cpflow-deploy-review-app.yml b/.github/workflows/cpflow-deploy-review-app.yml
--- a/.github/workflows/cpflow-deploy-review-app.yml
+++ b/.github/workflows/cpflow-deploy-review-app.yml
@@ -157,6 +157,18 @@
           echo "Review app deploys from fork pull requests are not allowed for workflow_dispatch because this workflow uses repository secrets." >&2
           exit 1
 
+      - name: Comment on fork PR deploy rejection
+        if: steps.config.outputs.ready == 'true' && steps.source.outputs.allowed == 'false' && github.event_name == 'issue_comment'
+        uses: actions/github-script@v7
+        with:
+          script: |
+            await github.rest.issues.createComment({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              issue_number: Number(process.env.PR_NUMBER),
+              body: `Review app deploys from fork pull requests require a branch in ${process.env.GITHUB_REPOSITORY}.\n\nThis workflow builds Docker images with repository secrets, so comment-triggered deploys only run for branches in the base repository.`
+            });
+
       - name: Checkout PR commit
         if: steps.config.outputs.ready == 'true' && steps.source.outputs.allowed == 'true'
         uses: actions/checkout@v4

_{You can send follow-ups to the cloud agent here.}

github-actions · 2026-05-01T22:26:23Z

Review app ready

Open review app

Open Control Plane console
View workflow logs

claude · 2026-05-01T22:29:44Z

Code Review: Generated cpflow GitHub Actions Flow

Overview

This PR replaces hand-written Control Plane GitHub Actions with the generated cpflow-* action and workflow set. It's a significant improvement in both security posture and operational robustness. The PR is well-considered and production-deployable with a few items worth addressing or acknowledging before merge.

Security — Notable Improvements ✅

Token no longer leaks into process args: The old setup-environment action called cpln profile update default --org "$ORG" --token "$TOKEN", which exposes the token in /proc/<pid>/cmdline and ps output. The new setup action passes it via CPLN_TOKEN env var only.
SSH key isolation: The new cpflow-build-docker-image action handles the SSH private key in a dedicated step with a scoped env, so ACTIONS_STEP_DEBUG=true (which dumps env before commands) cannot expose the key in the build step.
Randomized heredoc delimiters: CPLN_TOKEN is written to GITHUB_ENV using CPLN_TOKEN_$(openssl rand -hex 8) as the terminator, preventing a multiline token value from terminating the heredoc early.
Prefix guard + production/staging word guard on deletion: The new delete-app.sh checks both that the app name matches REVIEW_APP_PREFIX-* AND that it doesn't contain production or staging as word segments — defense in depth against a misconfigured prefix.
author_association check added to slash commands: The old /delete-review-app handler had no author check; the new one requires OWNER, MEMBER, or COLLABORATOR.
Old staging workflow triggered on ALL branches: The deleted deploy-to-control-plane-staging.yml had branches: ['*'], meaning every branch push would attempt a staging deploy. The new workflow correctly gates to master only.
eval() removed from JavaScript steps: The deleted workflows used eval(process.env.GET_CONSOLE_LINK) — an eval of an env-derived string in JS — which is now gone entirely.

Issues and Observations

🔴 High — Rollback doesn't reverse migration side-effects

The rollback step in cpflow-promote-staging-to-production.yml restores workload container images via cpln workload update --set spec.containers[N].image=..., but any release_script side-effects (database migrations, cache warm-ups, etc.) that ran during the failed promotion are not reversed. The shakacode-team.md docs mention this, but the workflow itself has no warning comment at the rollback step. Teams discovering a failed promotion may not immediately check docs and could assume a clean rollback.

Suggestion: Add a ::warning:: annotation at the top of the rollback step so it's visible in the Actions UI when a rollback runs:

echo "::warning::Rollback restores workload images only. Database migrations and release_script side-effects are NOT reversed. Verify database state manually before re-promoting."

🟡 Medium — `issue_comment` slash-command changes don't take effect until merged

The cpflow-deploy-review-app.yml workflow is triggered by issue_comment events, which GitHub always runs from the default branch. A comment like /deploy-review-app on a PR that contains changes to this very workflow will execute the default branch version, not the PR branch version. The workaround (gh workflow run cpflow-deploy-review-app.yml --ref <branch> -f pr_number=<pr-number>) is documented inline and in the docs — but developers iterating on the workflow may be surprised the first time. Consider adding a visible note in cpflow-help.md about this constraint.

🟡 Medium — `pull_request_target` in delete workflow requires ongoing vigilance

cpflow-delete-review-app.yml uses pull_request_target (which has secret access) and adds a correct comment explaining the safe checkout. However, pull_request_target with composite actions loaded from the checked-out base is a pattern that historically has been misused. The current code is correct, but future contributors editing this workflow should be aware of the trust boundary. Consider adding a prominent # SECURITY: comment block at the workflow level (not just inside the checkout step) summarizing the constraint.

🟡 Medium — Serial rollback readiness polling

The "Wait for rollback readiness" step in cpflow-promote-staging-to-production.yml polls each workload sequentially, so the worst-case wait is N_workloads × ROLLBACK_READINESS_RETRIES × ROLLBACK_READINESS_INTERVAL. For single-workload apps this is fine, but multi-workload apps could exceed the 45-minute job timeout if retries are high. Consider either parallelising (requires a matrix job, complex), or documenting the calculation in the defaults comment.

🟡 Medium — Environment variable parity check scope

The "Verify production environment variables" step compares GVC-level env var names only. It does not check:

Whether the values match (intentional — secrets can't be compared)
Workload-level env vars
cpln:// secret references

This is documented in the step summary output and in the docs. However, a renamed secret (old name still in production, new name only in staging) would pass this check since only names are compared. The check provides a useful signal but not a guarantee of parity — make sure the team understands its scope.

🟢 Minor — Two `/help` workflows may produce duplicate comments

cpflow-review-app-help.yml posts a quick-reference comment when a PR is opened, and cpflow-help-command.yml posts the full cpflow-help.md when /help is commented. On a new PR, both could end up posting within seconds of each other. The two messages are complementary, but consider whether the PR-open message is worth the noise. Alternatively, deduplicate by checking for existing bot comments before posting.

🟢 Minor — `validate-branch` job holds a 5-minute timeout but may checkout

In cpflow-deploy-staging.yml, the validate-branch job runs a fast branch check first, then conditionally does actions/checkout@v4 within the same 5-minute timeout window. For a large repository the checkout alone can take 1-2 minutes, leaving little headroom. Consider bumping validate-branch's timeout-minutes to 10 or splitting the checkout into the build job's scope.

Summary

The overall direction is excellent. The security improvements are real and meaningful. The main pre-merge items are: adding a warning annotation to the rollback step (so on-call engineers see it in the UI during an incident), and considering the issue_comment workflow-version caveat in the user-facing docs. Everything else is polish or awareness.

claude · 2026-05-01T22:31:30Z

+# Guard against a misconfigured REVIEW_APP_PREFIX that would otherwise match a
+# well-known shared environment. This intentionally rejects review-app prefixes
+# containing these word segments; failing closed is safer for deletion.
+if echo "$APP_NAME" | grep -iqE '(^|-)(production|staging)(-|$)'; then


The word-boundary regex (^|-)(production|staging)(-|$) is a significant improvement over the deleted action's grep -iqE '(production|staging)', which would have incorrectly blocked a hypothetical review-app prefix like staging-feature-demos-pr. The new pattern correctly rejects only apps where production or staging appears as a full hyphen-delimited word segment.

One edge case to be aware of: this guard runs on $APP_NAME (the fully-formed app name including prefix + PR number), not on $REVIEW_APP_PREFIX alone. So a review app named qa-app-pr-1234 passes the guard even if someone later misconfigures the prefix. That's the intended "fail closed on the constructed name" behavior — just confirming it's deliberate.

claude · 2026-05-01T22:31:39Z

+
+        # Persist the token for later cpflow/cpln steps via env. Use a randomized
+        # delimiter so a multiline token cannot terminate the heredoc early.
+        delim="CPLN_TOKEN_$(openssl rand -hex 8)"


Good use of a randomized heredoc delimiter. The old setup-environment action called cpln profile update default --org "$ORG" --token "$TOKEN", which would expose the token in /proc/<pid>/cmdline and ps output on any runner with a shared process table. The env-var approach here avoids that, and the randomized delimiter (CPLN_TOKEN_<hex>) prevents a multiline token value from accidentally terminating the heredoc early.

Note: writing CPLN_TOKEN to GITHUB_ENV means it persists for all subsequent steps in the job. That's intentional so downstream cpflow/cpln commands pick it up automatically, but it does mean any step that runs after this one has an implicit dependency on the env var being set. Worth keeping in mind when adding new steps to calling workflows.

claude · 2026-05-01T22:31:46Z

+        run: |
+          # Best-effort rollback: try every workload, aggregate failures, exit non-zero at the end
+          # if any failed. A single cpln hiccup shouldn't leave other workloads mid-promotion.
+          set -uo pipefail


set -uo pipefail intentionally omits -e so that a failure for one workload doesn't abort the loop before attempting all remaining workloads. This is the correct pattern for best-effort rollback (aggregate failures, then exit non-zero at the end). Worth a brief inline note here to prevent a future reviewer from "fixing" it to set -euo pipefail:

Suggested change

set -uo pipefail

# Intentionally omit -e: we want to attempt rollback on all workloads even if one fails.

set -uo pipefail

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Delete workflow missing config-ready gate on subsequent steps
- Added the config step id and gated every downstream delete workflow step on the validate action's ready output so missing configuration exits quietly.

Or push these changes by commenting:

@cursor push bf3a06d409

Preview (bf3a06d409)

diff --git a/.github/workflows/cpflow-delete-review-app.yml b/.github/workflows/cpflow-delete-review-app.yml
--- a/.github/workflows/cpflow-delete-review-app.yml
+++ b/.github/workflows/cpflow-delete-review-app.yml
@@ -53,6 +53,7 @@
         uses: actions/checkout@v4
 
       - name: Validate required secrets and variables
+        id: config
         uses: ./.github/actions/cpflow-validate-config
         env:
           CPLN_TOKEN_STAGING: ${{ secrets.CPLN_TOKEN_STAGING }}
@@ -66,6 +67,7 @@
           pull_request_friendly: "true"
 
       - name: Setup environment
+        if: steps.config.outputs.ready == 'true'
         uses: ./.github/actions/cpflow-setup-environment
         with:
           token: ${{ secrets.CPLN_TOKEN_STAGING }}
@@ -74,6 +76,7 @@
           cpflow_version: ${{ vars.CPFLOW_VERSION }}
 
       - name: Set workflow links
+        if: steps.config.outputs.ready == 'true'
         uses: actions/github-script@v7
         with:
           script: |
@@ -85,6 +88,7 @@
             );
 
       - name: Create initial PR comment
+        if: steps.config.outputs.ready == 'true'
         id: create-comment
         uses: actions/github-script@v7
         with:
@@ -98,6 +102,7 @@
             core.setOutput("comment-id", comment.data.id);
 
       - name: Delete review app
+        if: steps.config.outputs.ready == 'true'
         uses: ./.github/actions/cpflow-delete-control-plane-app
         with:
           app_name: ${{ env.APP_NAME }}
@@ -105,6 +110,7 @@
           review_app_prefix: ${{ vars.REVIEW_APP_PREFIX }}
 
       - name: Mark GitHub deployment inactive
+        if: steps.config.outputs.ready == 'true'
         uses: actions/github-script@v7
         with:
           script: |
@@ -132,7 +138,7 @@
             }
 
       - name: Finalize delete status
-        if: always()
+        if: always() && steps.config.outputs.ready == 'true'
         uses: actions/github-script@v7
         with:
           script: |

_{You can send follow-ups to the cloud agent here.}

^{Reviewed by Cursor Bugbot for commit fdda08d. Configure here.}

claude · 2026-05-01T22:31:55Z

+
+          {
+            echo "### Environment parity scope"
+            echo "This check compares GVC-level environment variable names only."


Good that the step summary explicitly documents the scope limitation. One scenario that can slip through: if a staging secret is renamed (old name removed from staging, new name added), and production still has the old name, the comm -23 check only reports what's missing from production — so a renamed var where staging has NEW_NAME and production has OLD_NAME would not be caught (neither appears as "missing" from the other's perspective; they're just different names).

This is inherent to comparing names without values, and worth noting in the step summary output or the docs. The current wording "compares GVC-level environment variable names only" does cover this, but the rename case is subtle enough that an explicit sentence could help: "Renamed variables (removed from staging, not yet removed from production) are reported as production-only variables, not as missing."

claude · 2026-05-01T22:32:03Z

+    # deploy branches unless `cpflow generate-github-actions --staging-branch BRANCH`
+    # was used. If STAGING_APP_BRANCH is later changed in repository variables, keep
+    # this list in sync so pushes to that branch actually trigger the workflow.
+    branches: ["master"]


This is a meaningful improvement. The deleted deploy-to-control-plane-staging.yml had branches: ['*'], which would trigger a staging deploy attempt on every branch push to the repo — relying solely on the validate-branch job's runtime check to bail out. The new workflow correctly gates at the workflow trigger level, so only pushes to master start any jobs at all, saving runner minutes and reducing noise.

claude · 2026-05-01T22:32:10Z

+      (github.event_name == 'issue_comment' &&
+       github.event.issue.pull_request &&
+       github.event.comment.body == '/deploy-review-app' &&
+       contains(fromJson('["OWNER","MEMBER","COLLABORATOR"]'), github.event.comment.author_association))


Good addition: the old delete-review-app.yml checked github.event.issue.pull_request (ensuring the comment was on a PR) but had no author_association guard, meaning any user who could comment on a PR could trigger a deploy. The new filter restricts slash commands to OWNER, MEMBER, and COLLABORATOR — appropriate for secrets-backed deployments.

One subtlety: COLLABORATOR in GitHub's model means a user explicitly invited to the repo with at least triage access. External contributors on a public repo default to NONE. This is the right default for protecting Control Plane credentials.

github-actions · 2026-05-01T22:35:18Z

Review app ready

Open review app

Open Control Plane console
View workflow logs

claude · 2026-05-01T22:38:50Z

Code Review: Replace hand-written CI/CD with generated cpflow GitHub Actions

Overview

This PR replaces ~1,150 lines of hand-crafted Control Plane GitHub Actions with ~2,025 lines of generated cpflow-* composite actions and workflows. The scope covers four deployment flows: opt-in review apps (slash-command triggered), branch-gated staging deploys, manual staging→production promotion with rollback, and nightly stale review-app cleanup.

Security — Significant Improvements ✅

The new code fixes several real security issues present in the deleted workflows:

Issue	Old	New
Token in argv	`--token "$TOKEN"` (visible in `ps`/`/proc`)	`CPLN_TOKEN` env var
Token scope	Workflow-level `env:` (all steps, all jobs)	Passed only to setup action, then via `GITHUB_ENV` with masked value
Slash command auth	No `author_association` check	`OWNER/MEMBER/COLLABORATOR` gate on both deploy and delete
Slash command injection	`contains(github.event.comment.body, '/deploy-review-app')`	Exact body match (`== '/deploy-review-app'`)
Fork PR secrets	`pull_request` event: fork PRs could trigger build steps	Fork head detection + `same_repo` guard
PR-close cleanup from forks	`pull_request` (read-only for forks)	`pull_request_target` with base-branch checkout only
SSH key in env	N/A	Isolated step, trap cleanup, defence-in-depth `if: always()` removal
`delete-app.sh` safety guard	`grep -iqE '(production\|staging)'` (matches substrings)	Word-boundary regex `(^\|-)(production\|staging)(-\|$)`

Concerns

1. pull_request: opened trigger burns CI minutes on every new same-repo PR

cpflow-deploy-review-app.yml now triggers on [opened, synchronize, reopened]. For opened events, no review app exists yet, so the workflow always reaches the "Skip auto deploy until a review app is created" step — but only after running checkout, Ruby setup, cpln/cpflow install, PR resolve, source validation, and the cpflow exists call. The old workflow used [synchronize, reopened] only.

If the intent is just to inform developers, a lighter approach would be to move the skip check before the setup steps, or gate opened events out of the job-level if condition (note that it's already gated to same-repo PRs, so fork churn is avoided, but same-repo churn is not).

2. CPLN_TOKEN written to $GITHUB_ENV (broader than needed)

In cpflow-setup-environment/action.yml, the CP token is persisted to all subsequent steps in the calling job via GITHUB_ENV. GitHub Actions masks the secret value in logs, so this doesn't expose it in plain text. But it does increase the attack surface: any later step (including third-party composite actions called after setup) inherits the CPLN_TOKEN env var and could exfiltrate it in a compromised supply-chain scenario.

An alternative: cpflow's CLI supports --token in non-interactive use, or the env var could be cleared after the login step using a follow-up echo "CPLN_TOKEN=" >> "$GITHUB_ENV". That said, cpflow commands in downstream steps do rely on this token, so clearing it would require rethinking those steps.

3. cpflow-promote-staging-to-production.yml — contents: write is job-wide

permissions: contents: write is required only for the gh release create step. Since all jobs share this workflow-level permission, every step in the promote-to-production job (including Ruby scripts reading controlplane.yml) runs with write access to repo contents. Splitting the release creation into a separate job with scoped permissions would reduce the blast radius of a compromised step.

4. Env-var parity check gap (documented but worth flagging)

The "Verify production environment variables" step in the promotion workflow compares GVC-level env var names only. Workload-level env vars and cpln:// secret references are not checked. This is correctly documented in both the code and shakacode-team.md, but it means a new workload-level env var added to staging but missing in production will not block promotion.

Code Quality

The new code is a substantial improvement:

set -euo pipefail on all shell scripts (old code used set -e only)
Randomized heredoc delimiters for multi-line GITHUB_OUTPUT/GITHUB_ENV values (prevents injection via newlines)
Proper cpflow exists exit-code semantics (0 = exists, 2 = not found, other = error)
Ruby installed without sudo npm; user-local npm prefix avoids system permissions
Ruby auto-detected from .ruby-version/Gemfile instead of hardcoded 3.4.6
URL extraction via jq -r '.status.endpoint' (old code used a fragile grep -oP regex)
Rollback captures state for all workloads before deploying, handles container-set drift, and aggregates failures
Concurrency groups prevent partial deploys from cancellation mid-rollout

Minor Notes

cpflow-deploy-staging.yml uses cancel-in-progress: false, which means rapid pushes to master queue up sequentially. Deploys always complete in submission order, so the final state is always the latest push — this is correct behavior, but could cause a slow drain under a burst of merges.
The staged PR comment updates (starting → building → deploying → done) give a good UX and avoid the "silent for 5 minutes" problem common in deploy workflows.
cpflow-review-app-help.yml correctly uses pull_request_target without a code checkout, with a clear warning comment against adding one.

Summary

This is a high-quality, well-documented overhaul. The security posture is meaningfully better than what it replaces. The three items above (CI minute waste on opened, token scope, and permission scope) are worth considering before merge, but none are blockers. The PR is in draft — once the rspec gap (PostgreSQL not running locally) is resolved and the staging deploy is verified end-to-end, this looks ready.

claude · 2026-05-01T22:38:55Z

+
+on:
+  pull_request:
+    types: [opened, synchronize, reopened]


Adding opened to the trigger list means every new same-repo PR runs this workflow even when no review app has been created yet. The job will checkout, install Ruby + cpln + cpflow, resolve the PR, validate the source, and call cpflow exists before reaching the "Skip auto deploy" step. The old workflow used [synchronize, reopened] only.

If the goal is just to inform developers, consider moving the skip earlier (before setup) or removing opened and relying on the help message posted by cpflow-review-app-help.yml instead. At minimum, a comment explaining the intentional design trade-off would help future maintainers.

Suggested change

types: [opened, synchronize, reopened]

types: [synchronize, reopened]

claude · 2026-05-01T22:39:02Z

+        # Persist the token for later cpflow/cpln steps via env. Use a randomized
+        # delimiter so a multiline token cannot terminate the heredoc early.
+        delim="CPLN_TOKEN_$(openssl rand -hex 8)"
+        {


Writing CPLN_TOKEN to $GITHUB_ENV makes it available to every subsequent step in the calling job, including any third-party composite actions invoked after this one. GitHub's log masking prevents the value from appearing in plain text, but the env var is still accessible to step code.

This is likely necessary because downstream cpflow and cpln commands depend on it. If it can be scoped more narrowly (e.g., only passed as env: to steps that need it rather than persisted globally), that would reduce the blast radius of a compromised downstream action. Worth a comment here explaining why global persistence is required.

claude · 2026-05-01T22:39:08Z

+        type: string
+
+permissions:
+  contents: write  # Required for `gh release create` in the "Create GitHub release" step.


contents: write is required only for the gh release create step, but this permission applies to the entire promote-to-production job. All steps — including the Ruby script that reads controlplane.yml, the env-var parity check calling cpln gvc get, and the rollback logic — run with write access to repo contents.

Consider splitting the GitHub release creation into a separate job that depends on promote-to-production with only contents: write, and narrowing this workflow permission to contents: read. That limits the exposure if any intermediate step (e.g., a cpln CLI bug or a compromised gem) tries to write to the repo.

github-actions · 2026-05-01T22:43:10Z

Review app ready

Open review app

Open Control Plane console
View workflow logs

claude · 2026-05-01T22:46:31Z

Code Review

Overview

This PR replaces bespoke, hand-written GitHub Actions with the generated cpflow-* composite actions and workflows. It adds structured review app lifecycle management (opt-in create, auto-redeploy on push, delete on PR close), staging deploys gated to a branch, manual production promotion with health-check + best-effort rollback, and a nightly stale-app cleanup. The documentation updates are comprehensive and thorough.

Overall this is a significant quality improvement. The security posture is meaningfully better than the code being replaced (details below).

Security Improvements (notable fixes over old code)

Token exposure fixed: Old setup-environment/action.yml called cpln profile update default --org "$ORG" --token "$TOKEN", which leaked the token into /proc/<pid>/cmdline and ps output. New code passes the token exclusively via CPLN_TOKEN env var.
Fork PR secret exposure fixed: Old deploy-to-control-plane-review-app.yml triggered on pull_request for all PRs including forks, and used contains(github.event.comment.body, '/deploy-review-app') (partial match — any comment containing the substring triggered a deploy). New code uses exact match, restricts to OWNER/MEMBER/COLLABORATOR, and explicitly rejects fork-head builds.
PR-close cleanup from forks: Old delete-review-app.yml used pull_request for the closed event type, which has no write access for fork PRs — review apps from forks were silently never cleaned up. New code correctly uses pull_request_target with a clear trust-boundary comment.
Deletion guard hardened: Old delete-app.sh checked a simple regex on the app name for "production"/"staging". New code also validates the app name starts with the required REVIEW_APP_PREFIX before any deletion proceeds.
SSH key isolation: SSH private key is handled in a dedicated step whose env scope is explicitly limited, with a defense-in-depth always() cleanup step to cover runner cancellations.

Issues to Address

1. CPLN_TOKEN is written into GITHUB_ENV as plaintext

In cpflow-setup-environment/action.yml lines 86-93, the token is written to $GITHUB_ENV with a randomized heredoc delimiter (good: prevents heredoc injection). However, $GITHUB_ENV is a file on the runner's filesystem readable by any subsequent step in the same job. GitHub Actions masks known secret values in logs, but the token is now accessible as an env var named CPLN_TOKEN to every subsequent step — including third-party actions. This is a common pattern, but for a token with production access (CPLN_TOKEN_PRODUCTION in the promote workflow), consider scoping it to only steps that need it via env: on individual steps rather than broadcasting it via GITHUB_ENV.

2. Production promotion env-var check is incomplete

The "Verify production environment variables" step (promote workflow) compares GVC-level env var names only — it explicitly skips workload-level env vars and cpln:// secret references. This is called out clearly in the step summary, but a workload-level secret (e.g. DATABASE_URL set on the rails workload via a cpln:// reference) could be absent from production with no warning. Consider documenting a companion manual checklist or a future enhancement to cover workload-level checks.

3. Rollback does not cover release-phase side effects

The rollback step correctly warns: "Database migrations and release_script side effects are not reversed." However, .controlplane/shakacode-team.md doesn't yet have a "What to do if production promotion fails" section that references this limitation and describes manual recovery steps. Adding this would reduce MTTR during an incident.

4. Nightly cleanup timing

cpflow-cleanup-stale-review-apps.yml runs at 0 0 * * * (midnight UTC). If a review app is actively being tested late at night the nightly sweep could delete it mid-session. Consider whether cpflow cleanup-stale-apps has a --max-age or inactivity threshold that maps to your team's typical PR lifetime.

Minor Notes

cpflow-review-app-help.yml posts a comment on every non-bot PR open. For repos with high PR velocity this generates noise. A team-preference label (e.g. skip-review-app-help) could allow opting out per-PR.
cpflow-deploy-review-app.yml triggers on synchronize and reopened but not opened. This is the intentional opt-in design, and the PR-open help comment mitigates the discoverability gap well.
cpflow-help-command.yml's /help handler does fs.readFileSync('.github/cpflow-help.md', 'utf8') without a try/catch. If the file is accidentally deleted, the step throws an uncaught exception instead of a clean message. Wrapping in try/catch with a fallback would be more user-friendly.
Old setup-environment/action.yml used sudo npm install -g for the CP CLI. New code uses a user-local ~/.npm-global prefix — better practice, avoids unnecessary sudo escalation.

Summary

The PR is in good shape. The security regressions in the old code have been properly fixed. The main items to address before merging are: (1) noting the GITHUB_ENV token persistence as an accepted risk or planned follow-up, and (2) adding a brief on-call runbook entry for production promotion failures that references the rollback limitation.

claude · 2026-05-01T22:46:38Z

+          echo "CPLN_TOKEN<<${delim}"
+          echo "${CPLN_TOKEN}"
+          echo "${delim}"
+        } >> "$GITHUB_ENV"


The randomized heredoc delimiter here is a good defence against a malicious token value terminating the heredoc early — nice touch.

One concern worth noting: writing to $GITHUB_ENV broadcasts CPLN_TOKEN to all subsequent steps in the job, including any third-party actions added later. For the staging org this is acceptable, but the promote workflow reuses this action with CPLN_TOKEN_PRODUCTION. If a third-party action is ever added to that job it will have implicit access to the production token.

Alternative: instead of persisting to GITHUB_ENV, pass the token to only the steps that need it via env: on each individual step. This is more verbose but tightly scopes the secret.

claude · 2026-05-01T22:46:43Z

+# Guard against a misconfigured REVIEW_APP_PREFIX that would otherwise match a
+# well-known shared environment. This intentionally rejects review-app prefixes
+# containing these word segments; failing closed is safer for deletion.
+if echo "$APP_NAME" | grep -iqE '(^|-)(production|staging)(-|$)'; then


The regex (^|-)(production|staging)(-|$) is a good improvement over the old plain grep -i 'production|staging', which would have blocked names like qa-productions-preview unnecessarily. The word-boundary anchors reduce false positives.

One edge case: this check runs after the prefix check on line 10. If someone sets REVIEW_APP_PREFIX to something that passes line 10 but contains staging or production as a segment (e.g. qa-staging-pr), this guard fires and blocks all review app deletions entirely — a confusing failure mode. The comment on lines 17-19 already calls this out ("intentionally rejects review-app prefixes containing these word segments"), which is the right trade-off (fail closed). Just make sure README / team docs note this constraint when choosing a prefix value.

claude · 2026-05-01T22:46:51Z

+  # SECURITY: pull_request_target is intentional for PR-close cleanup from forks.
+  # This workflow must keep checking out trusted base-branch code only; do not
+  # change checkout to a PR head ref without re-evaluating secret exposure.
+  pull_request_target:


pull_request_target is the correct event here — pull_request cannot write to secrets, so fork PRs would silently fail to delete their review apps on close. The existing comment (lines 4-6) explaining the trust boundary is important and should not be removed.

One subtlety to keep in mind: pull_request_target runs from the default branch, so this workflow's code at the closed event always reflects what's on master, not what's on the merged/closed PR branch. This is intentional and desirable (it means the workflow itself cannot be tampered with via the PR), but it also means that changes to this workflow file only take effect after they're merged to master — testing workflow changes via a PR won't exercise the PR-close path until after merge.

claude · 2026-05-01T22:46:53Z

+      github.event_name == 'workflow_dispatch' ||
+      (github.event_name == 'issue_comment' &&
+       github.event.issue.pull_request &&
+       github.event.comment.body == '/deploy-review-app' &&


Exact match (==) rather than the old contains() is the right choice — the old approach would have triggered a deploy from any comment that mentioned /deploy-review-app as part of a longer message (e.g. "don't run /deploy-review-app yet"). The exact-match requirement is also documented in the help text, so this is consistent.

Apply generated cpflow GitHub Actions flow

56c53a0

Document generated cpflow workflow settings

eb1dbb2

cursor Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread .github/workflows/cpflow-promote-staging-to-production.yml

claude Bot reviewed Apr 30, 2026

View reviewed changes

Harden cpflow review app workflow

b9095c6

claude Bot reviewed Apr 30, 2026

View reviewed changes

Improve cpflow workflow review feedback

1e46bfe

cursor Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread .github/actions/cpflow-setup-environment/action.yml

claude Bot reviewed Apr 30, 2026

View reviewed changes

Fix cpflow token propagation

0a148a4

coderabbitai Bot approved these changes Apr 30, 2026

View reviewed changes

claude Bot reviewed Apr 30, 2026

View reviewed changes