Skip to content

tangle-network/agent-runtime

Repository files navigation

@tangle-network/agent-runtime

The engine Tangle's AI agents run on. It runs an agent — a chat turn, a one-shot task, or a team of agents working toward a goal — records every run, and uses those records to measure and improve agents against real pass/fail checks.

One loop, used three ways. Domain behavior (models, tools, knowledge) plugs in as adapters; the scoring statistics and the ship decision come from @tangle-network/agent-eval; sandboxed execution from @tangle-network/sandbox.

pnpm add @tangle-network/agent-runtime @tangle-network/agent-eval @tangle-network/sandbox

See it run in 30 seconds (offline, no keys) — the one move everything else builds on, a driver reading a worker's output and composing the next step from it:

pnpm tsx examples/driver-loop/driver-loop.ts

What you do with it

You want to… Call
Run a chat turn — what every product agent does in production handleChatTurn(...)
Have one agent supervise a team of agents toward a goal supervise(profile, task, opts)
Improve an agent and prove the gain on fresh tasks improve(profile, findings, opts)

Run a chat turn

A product agent is one handleChatTurn call inside a route. You give it how to produce the response and how to persist it; it streams, traces, and persists.

import { handleChatTurn } from '@tangle-network/agent-runtime'

const result = handleChatTurn({
  identity: { tenantId, sessionId: threadId, userId, turnIndex: 0 },
  hooks: {
    produce: () => ({ stream: box.streamPrompt(userMessage), finalText: () => box.lastResponse() }),
    persistAssistantMessage: async ({ identity, finalText }) => db.insertMessage(identity, finalText),
  },
  waitUntil,
})
return new Response(result.body, { headers: { 'content-type': result.contentType } })

Supervise a team of agents

One supervisor spawns and steers workers toward a goal. Where the workers run (an in-process loop, or a sandboxed coding harness) is one data value; the budget, journaling, and stopping are handled for you.

import { supervise } from '@tangle-network/agent-runtime/loops'

const result = await supervise(
  { name: 'supervisor', harness: null, systemPrompt: 'Delegate to workers; do not solve the task yourself.' },
  'Implement the feature and make the tests pass.',
  { budget, router, backend }, // backend = where workers run: router-tools | sandbox+harness | bridge
)

Improve an agent

improve optimizes one part of an agent (its prompt, skills, or code) and only ships a change if it beats the current agent on tasks it never practiced on — so registering an agent for self-improvement can never make it worse.

import { improve } from '@tangle-network/agent-runtime'

const { profile, shipped, lift } = await improve(baseProfile, findings, {
  surface: 'prompt',        // what to optimize: prompt | skills | code
  gate: 'holdout',          // certified on a held-back exam, never the practice set
  scenarios, judge, agent,  // how to measure a candidate
})

How it works (the short version)

  • One agent, run two ways. The same agent runs at "do the task" speed and at "get better at the task" speed. "Driver", "worker", and "coordinator" aren't separate types — they're roles one agent plays.
  • Everything is measured. Every run is a trace: tokens, dollars, time, and a pass/fail score from a real check. "Better" is a number with a denominator, not a vibe — and "equally good but cheaper" is a result you can prove.
  • Improvement is gated. A change ships only after it beats the current agent on fresh tasks no tuning step ever saw, with a statistical test — not a single lucky run.
  • The grader is honest. Whatever gives feedback never sees the answer key, and scores are recomputed from the attempts actually run — an agent can't fabricate its own win.

Examples

Runnable, grouped by what they show — copy the one nearest your task:

Do this Example
Run a product chat turn chat-handler
Drive a team of agents to a goal supervise · recursive-supervisor
Benchmark strategies on your own domain coding-benchmark
Benchmark harnesses × models over a real task suite (the real WebCode dataset) webcode-matrix
Render a multi-profile leaderboard — ranked board + score matrix + SVG/HTML charts, any domain leaderboard(records)renderLeaderboardMarkdown / Svg / Html
Trace + bill + effort-gate the WebCode benchmark (the Intelligence SDK) intelligence-webcode
Self-improve an agent, gated on a held-out set improve · self-improving-coder
Study coordination vs raw compute ablation-suite

All 28 live in examples/.

Where to go next

About

The engine for running and improving AI agents. Run loops, analyze traces automatically, and feed learnings back into agents continuously to improve them.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors