docs: add ChainWeaver + evaluation-artifact integration cookbooks (#95, #96) by dgenio · Pull Request #112 · dgenio/agent-kernel

dgenio · 2026-05-30T22:54:09Z

What changed

Adds two more ecosystem integration cookbooks under docs/integrations/, each with a runnable, offline companion wired into make ci. Follows the pattern established by the contextweaver / repository-check cookbooks (#92, #93). Closes #95 and #96.

#95 — ChainWeaver compiled flows as capabilities

examples/chainweaver_flow.py (new) — a ChainWeaverDriver wraps a compiled flow behind the Driver protocol so the flow runs through the normal policy → token → invoke → firewall → trace pipeline and produces a kernel-visible ActionTrace. A flow-step failure is translated into a DriverError that preserves the flow id and failing step, so the orchestration context survives for the caller and the audit trail.
ChainWeaver stays an optional dependency: the driver only needs a run(inputs) method and a flow_id, so the example ships tiny CompiledFlow / FlowExecutionError stand-ins and imports no ChainWeaver package.
docs/integrations/chainweaver.md (new), tests/test_chainweaver_flow.py (new).

#96 — policy guardrails for statistical evaluation artifacts

examples/evaluation_artifact_policy.py (new) — a generic, producer-agnostic assess_artifact() layer lets an agent summarize an evaluation artifact while gating deployment/rollout recommendations on its support diagnostics. The gate is multi-signal (support_health, decision_stable, warnings, recommendation.intent) — a good point estimate with weak support is still blocked — and an unknown/missing support_health normalises to the safest state.
Denied actions are downgraded to a manual-review recommendation whose reason is recorded in ActionTrace.args, so the audit trail explains why an action was downgraded.
No statistical estimation is added and no producer (skdr-eval) dependency is taken; artifacts are fixtures.
docs/integrations/evaluation_artifacts.md (new), tests/test_evaluation_artifact_policy.py (new, covers ok / caution / high_risk).

Wiring

Makefile — both examples added to the example target so they run under make ci.
README.md + docs/integrations.md — link the two new pages.
CHANGELOG.md — [Unreleased] entries.

Why

#95 and #96 were the recommended coherent group from issue triage: both are additive, dependency-free integration cookbooks sharing the same code area (docs/integrations/ + examples/ + the Driver/capability pattern) and implementation path. No src/ changes were needed — integration-specific drivers live in the examples (the RepositoryCheckDriver precedent).

How verified

make ci — passes end to end:
- ruff format --check — clean
- ruff check — All checks passed!
- mypy src/ (strict) — Success: no issues found in 41 source files
- pytest — 580 passed, 1 skipped
- make example — all example scripts run, including the two new ones
New tests assert: the wrapped flow's audit trace (driver_id == "chainweaver", result_summary populated); flow-failure DriverError preserves flow id + failing step (type and message asserted); artifact decisions for ok/caution/high_risk; the multi-signal gate (good support but unstable decision is still denied); unknown support_health normalises to safest; and that a downgraded action records its reason in ActionTrace.args.

Tradeoffs / risks

Both examples use deterministic stand-ins (CompiledFlow; fixture artifacts) rather than the real ChainWeaver / skdr-eval packages — intentional, so the cookbooks run offline in CI and keep those projects optional. The docs note how to swap in the real producers.
assess_artifact's field names are a producer-neutral interim contract; if weaver-spec publishes a formal EvaluationArtifact, the field names should be aligned (noted in the doc).

Scope notes (Mode B)

Scope is limited to the two issues. No src/ or public-API changes; no new dependencies. Remaining open issues #94 (trace export shape) and #99 (property-based policy tests) are intentionally not included — they form a separate, less tightly coupled group and are better as their own PR.

https://claude.ai/code/session_013hGyqqjAquhtSZXeYPkAuU

Generated by Claude Code

#96) Add two more ecosystem integration cookbooks under docs/integrations/, each with a runnable, offline companion wired into `make ci`. Follows the pattern established by the contextweaver/repository-check cookbooks. #95 — ChainWeaver compiled flows as capabilities: a ChainWeaverDriver wraps a compiled flow behind the Driver protocol so the flow runs through the normal policy/audit pipeline and produces a kernel-visible ActionTrace. A flow-step failure is translated into a DriverError that preserves the flow id and the failing step, so the orchestration context survives for the caller and the audit trail. ChainWeaver stays an optional dependency (the driver only needs a run(inputs) method and a flow_id), so the example ships tiny CompiledFlow / FlowExecutionError stand-ins and depends on no ChainWeaver package. New docs/integrations/chainweaver.md, examples/chainweaver_flow.py, and tests/test_chainweaver_flow.py. #96 — policy guardrails for statistical evaluation artifacts: a generic, producer-agnostic assess_artifact() layer lets an agent summarize an evaluation artifact while gating deployment/rollout recommendations on its support diagnostics. The gate is multi-signal (support_health, decision_stable, warnings, recommendation.intent) — a good point estimate with weak support is still blocked — and an unknown/missing support_health normalises to the safest state. Denied actions downgrade to a manual-review recommendation whose reason is recorded in ActionTrace.args. No statistical estimation is added and no producer dependency is taken; artifacts are fixtures. New docs/integrations/evaluation_artifacts.md, examples/evaluation_artifact_policy.py, and tests/test_evaluation_artifact_policy.py (covering ok/caution/high_risk). Wiring: both examples added to the Makefile `example` target; README and docs/integrations.md link the new pages; CHANGELOG updated. make ci passes (fmt-check, lint, mypy strict, 580 passed / 1 skipped, examples run). https://claude.ai/code/session_013hGyqqjAquhtSZXeYPkAuU

Copilot

Pull request overview

Adds two new ecosystem integration cookbooks (docs/integrations/chainweaver.md, docs/integrations/evaluation_artifacts.md) and their runnable, offline companion examples + tests, following the precedent set by the contextweaver / repository-check cookbooks. No src/ or public-API changes; both examples are wired into make ci via the example target.

Changes:

New ChainWeaver integration: a ChainWeaverDriver wraps a compiled-flow stand-in as a Driver, with flow failures translated into DriverError that preserves flow id + failing step.
New evaluation-artifact policy guardrail: a producer-agnostic assess_artifact() multi-signal gate downgrades deployment recommendations to manual-review (recording the reason in ActionTrace.args).
README, docs/integrations.md, CHANGELOG.md, and Makefile updated to surface and run the two new cookbooks.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
examples/chainweaver_flow.py	New runnable example: `ChainWeaverDriver` + `CompiledFlow`/`FlowExecutionError` stand-ins, plus a release-notes flow.
examples/evaluation_artifact_policy.py	New runnable example: `assess_artifact` + kernel wiring for summarize / deploy / manual-review capabilities.
tests/test_chainweaver_flow.py	Tests flow ordering, error context, audit trace, and `DriverError` propagation.
tests/test_evaluation_artifact_policy.py	Tests `ok` / `caution` / `high_risk` paths, multi-signal gate, unknown-health normalisation, and audit-trace reason capture.
docs/integrations/chainweaver.md	New cookbook describing the ChainWeaver capability pattern.
docs/integrations/evaluation_artifacts.md	New cookbook describing the artifact policy/downgrade pattern.
docs/integrations.md	Adds links to the two new cookbooks.
README.md	Adds the two new cookbooks under the integrations list.
Makefile	Runs the two new examples under `make example`/`make ci`.
CHANGELOG.md	`[Unreleased]` entries for both cookbooks.

Copilot AI review requested due to automatic review settings May 30, 2026 22:54

Copilot started reviewing on behalf of dgenio May 30, 2026 22:54 View session

Copilot AI reviewed May 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add ChainWeaver + evaluation-artifact integration cookbooks (#95, #96)#112

docs: add ChainWeaver + evaluation-artifact integration cookbooks (#95, #96)#112
dgenio wants to merge 1 commit into
mainfrom
claude/github-issues-triage-kJFvP

dgenio commented May 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dgenio commented May 30, 2026

What changed

Why

How verified

Tradeoffs / risks

Scope notes (Mode B)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants