Skip to content

FIX REST API: PagerDuty notification rule integration key format#917

Open
apepojken wants to merge 1 commit into
Checkmk:masterfrom
apepojken:fix-pagerduty-rest-api-key-format
Open

FIX REST API: PagerDuty notification rule integration key format#917
apepojken wants to merge 1 commit into
Checkmk:masterfrom
apepojken:fix-pagerduty-rest-api-key-format

Conversation

@apepojken
Copy link
Copy Markdown

General information

This PR fixes a bug in Checkmk's REST API for notification rules. It is not specific to any monitored device or appliance — the affected component is the PagerDuty notification plugin and the REST API endpoint that creates/updates notification rules (/domain-types/notification_rule/collections/all). The bug breaks every PagerDuty notification rule created via the REST API on 2.4.x.

Bug reports

  • Operating system: Ubuntu 24.04 (Docker host); Checkmk RAW 2.4.0p29 running in the official Docker image
  • Local setup: Single-site, RAW edition, Traefik in front. Notification rules managed via an in-house Ansible collection that talks to the REST API. Any REST API client reproduces the bug — Ansible, Terraform, curl, etc.
  • Detailed steps to reproduce:
    1. POST a notification rule to /domain-types/notification_rule/collections/all with plugin_params.plugin_name = "pagerduty" and integration_key = {"option": "explicit", "key": "<your-key>"}.
    2. Activate changes.
    3. Inspect etc/check_mk/conf.d/wato/notification_parameter.mk — the new entry's routing_key field is ('routing_key', '<your-key>') (legacy 2-tuple) instead of ('cmk_postprocessed', 'explicit_password', ('<uuid>', '<your-key>')) (current form produced by the GUI).
    4. Trigger an alert that matches the rule.
    5. tail var/log/notify.log — plugin exits with Output: Unable to retrieve password from passwordstore and Plug-in exited with code 2. The PagerDuty incident is never created.
  • Agent output / SNMP walk: Not applicable — REST API + notification dispatcher bug, no agent involved.
  • Crash report ID: None — the plugin exits cleanly with code 2; no crash report is produced.

Proposed changes

Expected behavior: A PagerDuty rule created via the REST API delivers notifications, identical to one created through the GUI.

Observed behavior: The dispatcher exits with Unable to retrieve password from passwordstore every time. No PagerDuty incident is created.

How this patch changes the current behavior:

APIPagerDutyKeyOption.to_mk_file_format() was writing the integration key to notification_parameter.mk as the legacy 2-tuple ("routing_key", key). The notification dispatcher's get_password_from_env_or_context() (in packages/cmk-notification-plugins/cmk/notification_plugins/utils.py) collects every PARAMETER_ROUTING_KEY* env var into a list and only accepts the modern 3-tuple form ("cmk_postprocessed", "explicit_password", (uuid, key)), identified by the literal string "explicit_password" in the list. The 2-tuple form falls through to a password-store lookup using parameter[-2] ("routing_key") — not a valid store ID — and the plugin exits.

The GUI's FormSpec migrate function _migrate_to_password in cmk/gui/wato/_notification_parameter/_pagerduty.py:69-91 already produces the 3-tuple form for UI-touched rules. The REST API path bypassed it because APIPagerDutyKeyOption was a bespoke option class that wrote the legacy form directly; every other key-bearing plugin (ilert, signl4, opsgenie, sms_api) was already migrated to the APICheckmkPassword_From* wrapper classes that produce the correct form.

This PR:

  • Deletes APIPagerDutyKeyOption.
  • Switches PagerDutyPlugin.integration_key to APICheckmkPassword_FromKey (already in the same file, identical input shape).
  • Re-types PagerDutyPluginModel.routing_key from RoutingKeyType to CheckmkPassword.
  • Adds _normalize_pagerduty_routing_key() so existing on-disk rules in the legacy form keep working through the REST API path until the next save normalises them.
  • Incidentally fixes a second bug in the deleted class: from_mk_file_format() set cls(option="store", key=data[1]) for the store option instead of store_id=data[1], so reading a stored-key rule via the REST API returned an empty store_id.

Unit tests: Added four round-trip tests in tests/unit/cmk/gui/watolib/test_notifications.py:

  • API request → on-disk form produces the 3-tuple
  • Legacy explicit 2-tuple on disk → normalised to 3-tuple
  • Legacy store 2-tuple on disk → normalised to stored_password 3-tuple
  • Current 3-tuple on disk → preserved unchanged

The first three tests fail on master without this patch.

Is this a new problem? Introduced in 2.4.0 when CheckmkPassword/cmk_postprocessed became the canonical form across notification plugins. PagerDuty was missed in the migration that converted every other plugin to APICheckmkPassword_From* wrappers. No third-party device or firmware change is involved — this is a pure regression inside Checkmk.

Werk

20007 (.werks/20007.md) — class fix, component rest-api, level 1, compatible yes, version 2.6.0b1.

Backport request

This breaks production PagerDuty notifications on 2.4.x for any user creating rules via the REST API (Ansible, Terraform, custom scripts). A backport to 2.4 and 2.5 would be very helpful.

Local testing

  • ruff format --check and ruff check pass on all changed files.
  • Lightweight pre-commit hooks pass (whitespace, license headers, secrets scan, cmk namespace).
  • Bazel-backed pre-commit hooks (format/lint/mypy/unittest) not run locally — Bazel toolchain not installed in the author's dev environment. Relying on CI.
  • End-to-end behavioural verification on the affected 2.4.0p29 instance: with the legacy on-disk tuple manually rewritten to the cmk_postprocessed form, the dispatcher delivers the notification successfully. (The fix in this PR is the same rewrite, applied automatically.)

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

The REST API was writing the PagerDuty integration key to
notification_parameter.mk as ("routing_key", key) — the legacy 2-tuple form.
The notification dispatcher's get_password_from_env_or_context() builds a list
of all PARAMETER_ROUTING_KEY* env vars and only recognises the 3-tuple form
("cmk_postprocessed", "explicit_password", (uuid, key)) by the literal string
"explicit_password" in that list. The 2-tuple form falls through to the
password-store branch which tries to look up "routing_key" as a store ID,
fails, and the plugin exits with "Unable to retrieve password from
passwordstore".

The GUI's FormSpec migrate (_migrate_to_password in
cmk/gui/wato/_notification_parameter/_pagerduty.py) already produces the
3-tuple form for UI-touched rules; the REST API path bypassed it.

Replace APIPagerDutyKeyOption with APICheckmkPassword_FromKey (which already
sits directly above APIPagerDutyKeyOption in the same file and is used by all
other key-bearing notification plugins). PagerDutyPluginModel.routing_key is
re-typed from RoutingKeyType to CheckmkPassword. A normalizer in
PagerDutyPlugin.from_mk_file_format() handles existing on-disk rules in the
legacy form so they keep working through the API and are rewritten in the new
format on next save.

Werk: 20007

Tested:
- ruff format and lint pass on all changed files
- pre-commit lightweight hooks pass (whitespace, license, secrets, namespace)
- Bazel-backed pre-commit hooks (format/lint/mypy/unittest) not run locally;
  CI will validate

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@apepojken apepojken force-pushed the fix-pagerduty-rest-api-key-format branch from d7ee47b to 747d956 Compare May 19, 2026 11:21
@apepojken
Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA or my organization already has a signed CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants