Skip to content

Add SIA: Self-Improving AI with Harness & Weight Updates#3

Open
imnida wants to merge 1 commit into
masterfrom
claude/sia-repo-setup-X38L0
Open

Add SIA: Self-Improving AI with Harness & Weight Updates#3
imnida wants to merge 1 commit into
masterfrom
claude/sia-repo-setup-X38L0

Conversation

@imnida

@imnida imnida commented May 29, 2026

Copy link
Copy Markdown
Owner

Implements the SIA framework from Hebbar et al. (arXiv:2605.27276).
The loop lets a Feedback-Agent iteratively improve both the scaffold
(harness) and the model weights (LoRA) of a task-specific agent.

Key components:

  • sia_loop.py — main configurable loop (Meta-Agent → execute → Feedback-Agent)
  • meta_agent.py — generates initial scaffold A1 using Claude Sonnet 4.6
  • feedback_agent.py — analyses trajectory τg, decides harness vs weight update
  • task_agent.py — executes scaffold against dataset, captures trajectory
  • trajectory.py — structured execution log (Step, ToolCall, Trajectory)
  • verifier.py — deterministic per-instance reward interface
  • weight_updates/ — six RL algorithms: PPO+GAE, GRPO, Entropic Advantage
    Weighting, REINFORCE+KL, Best-of-N BC, DPO
  • tasks/ — three benchmark tasks: LawBench (191-class Chinese legal),
    AlphaEvolve TriMul (CUDA kernel), MAGIC scRNA-seq denoising

https://claude.ai/code/session_01DLqnGSQGNhPHnUzTLgJ6id

Implements the SIA framework from Hebbar et al. (arXiv:2605.27276).
The loop lets a Feedback-Agent iteratively improve both the scaffold
(harness) and the model weights (LoRA) of a task-specific agent.

Key components:
- sia_loop.py       — main configurable loop (Meta-Agent → execute → Feedback-Agent)
- meta_agent.py     — generates initial scaffold A1 using Claude Sonnet 4.6
- feedback_agent.py — analyses trajectory τg, decides harness vs weight update
- task_agent.py     — executes scaffold against dataset, captures trajectory
- trajectory.py     — structured execution log (Step, ToolCall, Trajectory)
- verifier.py       — deterministic per-instance reward interface
- weight_updates/   — six RL algorithms: PPO+GAE, GRPO, Entropic Advantage
                       Weighting, REINFORCE+KL, Best-of-N BC, DPO
- tasks/            — three benchmark tasks: LawBench (191-class Chinese legal),
                       AlphaEvolve TriMul (CUDA kernel), MAGIC scRNA-seq denoising

https://claude.ai/code/session_01DLqnGSQGNhPHnUzTLgJ6id
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants