Skip to content

docs: add ChainWeaver + evaluation-artifact integration cookbooks (#95, #96)#112

Open
dgenio wants to merge 1 commit into
mainfrom
claude/github-issues-triage-kJFvP
Open

docs: add ChainWeaver + evaluation-artifact integration cookbooks (#95, #96)#112
dgenio wants to merge 1 commit into
mainfrom
claude/github-issues-triage-kJFvP

Conversation

@dgenio
Copy link
Copy Markdown
Owner

@dgenio dgenio commented May 30, 2026

What changed

Adds two more ecosystem integration cookbooks under docs/integrations/, each with a runnable, offline companion wired into make ci. Follows the pattern established by the contextweaver / repository-check cookbooks (#92, #93). Closes #95 and #96.

#95 — ChainWeaver compiled flows as capabilities

  • examples/chainweaver_flow.py (new) — a ChainWeaverDriver wraps a compiled flow behind the Driver protocol so the flow runs through the normal policy → token → invoke → firewall → trace pipeline and produces a kernel-visible ActionTrace. A flow-step failure is translated into a DriverError that preserves the flow id and failing step, so the orchestration context survives for the caller and the audit trail.
  • ChainWeaver stays an optional dependency: the driver only needs a run(inputs) method and a flow_id, so the example ships tiny CompiledFlow / FlowExecutionError stand-ins and imports no ChainWeaver package.
  • docs/integrations/chainweaver.md (new), tests/test_chainweaver_flow.py (new).

#96 — policy guardrails for statistical evaluation artifacts

  • examples/evaluation_artifact_policy.py (new) — a generic, producer-agnostic assess_artifact() layer lets an agent summarize an evaluation artifact while gating deployment/rollout recommendations on its support diagnostics. The gate is multi-signal (support_health, decision_stable, warnings, recommendation.intent) — a good point estimate with weak support is still blocked — and an unknown/missing support_health normalises to the safest state.
  • Denied actions are downgraded to a manual-review recommendation whose reason is recorded in ActionTrace.args, so the audit trail explains why an action was downgraded.
  • No statistical estimation is added and no producer (skdr-eval) dependency is taken; artifacts are fixtures.
  • docs/integrations/evaluation_artifacts.md (new), tests/test_evaluation_artifact_policy.py (new, covers ok / caution / high_risk).

Wiring

  • Makefile — both examples added to the example target so they run under make ci.
  • README.md + docs/integrations.md — link the two new pages.
  • CHANGELOG.md[Unreleased] entries.

Why

#95 and #96 were the recommended coherent group from issue triage: both are additive, dependency-free integration cookbooks sharing the same code area (docs/integrations/ + examples/ + the Driver/capability pattern) and implementation path. No src/ changes were needed — integration-specific drivers live in the examples (the RepositoryCheckDriver precedent).

How verified

  • make ci — passes end to end:
    • ruff format --check — clean
    • ruff check — All checks passed!
    • mypy src/ (strict) — Success: no issues found in 41 source files
    • pytest580 passed, 1 skipped
    • make example — all example scripts run, including the two new ones
  • New tests assert: the wrapped flow's audit trace (driver_id == "chainweaver", result_summary populated); flow-failure DriverError preserves flow id + failing step (type and message asserted); artifact decisions for ok/caution/high_risk; the multi-signal gate (good support but unstable decision is still denied); unknown support_health normalises to safest; and that a downgraded action records its reason in ActionTrace.args.

Tradeoffs / risks

  • Both examples use deterministic stand-ins (CompiledFlow; fixture artifacts) rather than the real ChainWeaver / skdr-eval packages — intentional, so the cookbooks run offline in CI and keep those projects optional. The docs note how to swap in the real producers.
  • assess_artifact's field names are a producer-neutral interim contract; if weaver-spec publishes a formal EvaluationArtifact, the field names should be aligned (noted in the doc).

Scope notes (Mode B)

Scope is limited to the two issues. No src/ or public-API changes; no new dependencies. Remaining open issues #94 (trace export shape) and #99 (property-based policy tests) are intentionally not included — they form a separate, less tightly coupled group and are better as their own PR.

https://claude.ai/code/session_013hGyqqjAquhtSZXeYPkAuU


Generated by Claude Code

#96)

Add two more ecosystem integration cookbooks under docs/integrations/,
each with a runnable, offline companion wired into `make ci`. Follows the
pattern established by the contextweaver/repository-check cookbooks.

#95 — ChainWeaver compiled flows as capabilities: a ChainWeaverDriver wraps
a compiled flow behind the Driver protocol so the flow runs through the normal
policy/audit pipeline and produces a kernel-visible ActionTrace. A flow-step
failure is translated into a DriverError that preserves the flow id and the
failing step, so the orchestration context survives for the caller and the
audit trail. ChainWeaver stays an optional dependency (the driver only needs a
run(inputs) method and a flow_id), so the example ships tiny CompiledFlow /
FlowExecutionError stand-ins and depends on no ChainWeaver package. New
docs/integrations/chainweaver.md, examples/chainweaver_flow.py, and
tests/test_chainweaver_flow.py.

#96 — policy guardrails for statistical evaluation artifacts: a generic,
producer-agnostic assess_artifact() layer lets an agent summarize an evaluation
artifact while gating deployment/rollout recommendations on its support
diagnostics. The gate is multi-signal (support_health, decision_stable,
warnings, recommendation.intent) — a good point estimate with weak support is
still blocked — and an unknown/missing support_health normalises to the safest
state. Denied actions downgrade to a manual-review recommendation whose reason
is recorded in ActionTrace.args. No statistical estimation is added and no
producer dependency is taken; artifacts are fixtures. New
docs/integrations/evaluation_artifacts.md, examples/evaluation_artifact_policy.py,
and tests/test_evaluation_artifact_policy.py (covering ok/caution/high_risk).

Wiring: both examples added to the Makefile `example` target; README and
docs/integrations.md link the new pages; CHANGELOG updated. make ci passes
(fmt-check, lint, mypy strict, 580 passed / 1 skipped, examples run).

https://claude.ai/code/session_013hGyqqjAquhtSZXeYPkAuU
Copilot AI review requested due to automatic review settings May 30, 2026 22:54
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds two new ecosystem integration cookbooks (docs/integrations/chainweaver.md, docs/integrations/evaluation_artifacts.md) and their runnable, offline companion examples + tests, following the precedent set by the contextweaver / repository-check cookbooks. No src/ or public-API changes; both examples are wired into make ci via the example target.

Changes:

  • New ChainWeaver integration: a ChainWeaverDriver wraps a compiled-flow stand-in as a Driver, with flow failures translated into DriverError that preserves flow id + failing step.
  • New evaluation-artifact policy guardrail: a producer-agnostic assess_artifact() multi-signal gate downgrades deployment recommendations to manual-review (recording the reason in ActionTrace.args).
  • README, docs/integrations.md, CHANGELOG.md, and Makefile updated to surface and run the two new cookbooks.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
examples/chainweaver_flow.py New runnable example: ChainWeaverDriver + CompiledFlow/FlowExecutionError stand-ins, plus a release-notes flow.
examples/evaluation_artifact_policy.py New runnable example: assess_artifact + kernel wiring for summarize / deploy / manual-review capabilities.
tests/test_chainweaver_flow.py Tests flow ordering, error context, audit trace, and DriverError propagation.
tests/test_evaluation_artifact_policy.py Tests ok / caution / high_risk paths, multi-signal gate, unknown-health normalisation, and audit-trace reason capture.
docs/integrations/chainweaver.md New cookbook describing the ChainWeaver capability pattern.
docs/integrations/evaluation_artifacts.md New cookbook describing the artifact policy/downgrade pattern.
docs/integrations.md Adds links to the two new cookbooks.
README.md Adds the two new cookbooks under the integrations list.
Makefile Runs the two new examples under make example/make ci.
CHANGELOG.md [Unreleased] entries for both cookbooks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ChainWeaver integration

3 participants