Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
199d652
Add test client
Ajeets6 Apr 26, 2026
b50cf60
Add concurency
Ajeets6 May 1, 2026
08403d7
Increase cmd exec security
Ajeets6 May 1, 2026
143e1ca
Add ReAct mode support with action parsing and observation formatting
May 5, 2026
33b76e5
Add file operations tools
Ajeets6 May 8, 2026
7b209ae
Merge pull request #4 from scalabs/react-prompting
Ajeets6 May 8, 2026
e4511c9
Implement streaming continuation logic
Ajeets6 May 15, 2026
5dcfca6
Add compactContextToBudgetAlloc for message compression and retention
Ajeets6 May 15, 2026
e87b123
Add tool call tracking and confirmation for command execution
Ajeets6 May 15, 2026
9b52b78
Add mcp server bridge
Ajeets6 May 15, 2026
2e60d6e
Update README and client to enhance tool functionality and command co…
Ajeets6 May 15, 2026
91ea8a9
Add zig test runner
Ajeets6 May 27, 2026
21f479b
Add streming to react loop
Ajeets6 May 27, 2026
ac3cf53
Fix loop connection
Ajeets6 May 27, 2026
40a31fa
Add streming support for multi loops
Ajeets6 May 27, 2026
a4a1d54
Add recent request tracking to server state
Ajeets6 May 29, 2026
1a22adc
Add default model and workspace mode config
Ajeets6 May 29, 2026
fb13254
Add workspace_id to chat request model
Ajeets6 May 29, 2026
286ade1
Add ephemeral workspace memory store
Ajeets6 May 29, 2026
22de5e8
Support repo-root file tools in workspace mode
Ajeets6 May 29, 2026
c2d97d0
Improve ReAct loop for small coding models
Ajeets6 May 29, 2026
22985eb
Wire workspace mode and ReAct into server
Ajeets6 May 29, 2026
151be42
Honor default model across providers
Ajeets6 May 29, 2026
ec0ed18
Update build defaults and test wiring
Ajeets6 May 29, 2026
f8e9645
Document workspace and ReAct operator setup
Ajeets6 May 29, 2026
9233304
Fix tool calling
Ajeets6 Jun 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Active routing (change these to switch provider/model without editing code)
LLM_ROUTER_PROVIDER=ollama
LLM_ROUTER_MODEL=qwen3.5:9b

# Local Ollama
OLLAMA_BASE_URL=http://127.0.0.1:11434
OLLAMA_MODEL=qwen3.5:9b

# OpenAI
OPENAI_API_KEY=
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_MODEL=gpt-4.1-mini

# OpenRouter
OPENROUTER_API_KEY=
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
OPENROUTER_MODEL=openrouter/auto

# Anthropic Claude
CLAUDE_API_KEY=
CLAUDE_MODEL=claude-3-5-sonnet-latest

# Optional harness auth for clients/evals
# LLM_ROUTER_API_KEY=

# Tooling (off by default in untrusted deployments)
# LLM_ROUTER_TOOL_EXEC_ENABLED=0

# ReAct / small-model loops (optional)
# LLM_ROUTER_DEFAULT_MAX_CONTEXT_TOKENS=8192
# LLM_ROUTER_MAX_TOOL_CALLS_PER_REQUEST=24
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@ zig-out
.agents/
.ollama-qwen-env.example
.env
client/*
__pycache__/
logs/*
tmp/
# General
.DS_Store
__MACOSX/
Expand Down Expand Up @@ -32,3 +33,4 @@ Network Trash Folder
Temporary Items
.apdisk
.llm-router-env
.github/instructions/
195 changes: 187 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

OpenAI-compatible LLM router and prompt-loop runtime written in Zig.

This project exposes a chat-completions API, routes requests across multiple providers, supports optional streaming, and includes built-in loop controls for iterative agent workflows.
This project exposes a chat-completions API, routes requests across multiple providers, supports optional streaming, and includes built-in loop controls for iterative agent workflows. Scope: **thin, reliable model harness** (see `phase.md` Phase 0 and `plan.md` for the frozen feature set and deferrals).

> [!TIP]
> Fastest local smoke test:
Expand All @@ -19,6 +19,8 @@ This project exposes a chat-completions API, routes requests across multiple pro
- Optional debug tooling (echo, utc, cmd, bash) with safe defaults.
- Simple deploy surface: single Zig binary, environment-based configuration.



## Architecture

```mermaid
Expand Down Expand Up @@ -55,9 +57,12 @@ flowchart TD
## Build, Run, and Test

```bash
zig build
zig build # ReleaseSafe by default (~2 MB stripped binary)
zig build run
zig build check
zig build windows # cross-compile zig-coding-agent.exe for x86_64-windows-gnu
zig build -Doptimize=Debug # larger binary with safety checks for development
zig build -Doptimize=ReleaseSmall # smallest binary (~800 KB)
```

### Test Targets
Expand All @@ -79,6 +84,62 @@ zig build test -Dtest-target=file "-Dtest-file=src/types.zig"
zig build test -Dtest-target=all -Dtest-filter=normalizeProviderName
```

### zig_eval (optional sibling checkout)

Evaluations live in the sibling **zig_eval** repo. Check out both repos side by side:

```text
zig/
├── zig_coding_agent/ # this harness
└── zig_eval/ # registry-driven eval runner
```

`zig_eval/registry/services.json` targets this router at `http://127.0.0.1:8081` (`local-openai-compat`).

**Prerequisites:** LLM provider reachable (e.g. Ollama), harness listening, and matching `LLM_ROUTER_API_KEY` if auth is enabled.

```bash
# Terminal A: start harness — pick provider + model via env (or .env)
export LLM_ROUTER_PROVIDER=openrouter # ollama | openai | openrouter | claude | bedrock | llama_cpp
export LLM_ROUTER_MODEL=openai/gpt-4o-mini
export OPENROUTER_API_KEY=your-key # set the key for the active provider
zig build run -- --use-env

# Switch to local Ollama later (restart server):
# export LLM_ROUTER_PROVIDER=ollama
# export LLM_ROUTER_MODEL=qwen3.5:9b

# CLI overrides without editing env:
# zig build run -- --use-env --provider openai --model gpt-4.1-mini

# Terminal B: run all registry evals against the live harness
zig build zig_evals

# If .env points at a different default provider, probe the live provider explicitly
zig build zig_evals -Deval-provider=ollama

# Wait up to 30s for the server to come up, then run evals
zig build zig_evals -Deval-wait-seconds=30

# Filter evals (pass-through to zig_eval CLI)
zig build zig_evals -- --eval smoke.reply_ok --format json
zig build zig_evals -- --group smoke --parallel 2

# Override registry or service
zig build zig_evals -Deval-registry=examples/registry -Deval-service=local-product
```

Readiness checks before evals start:

1. `GET /health` on the harness (default `http://127.0.0.1:8081`)
2. A minimal `POST /v1/chat/completions` probe to confirm the LLM path works

List evals directly from the sibling checkout:

```bash
cd ../zig_eval && zig build run -- list --registry registry
```

## API Surface

### Endpoints
Expand All @@ -89,11 +150,11 @@ zig build test -Dtest-target=all -Dtest-filter=normalizeProviderName
| GET | /metrics | Yes (if API key configured) | Request and connection counters |
| GET | /diagnostics/clients | Yes (if API key configured) | Connected client diagnostics |
| GET | /diagnostics/requests | Yes (if API key configured) | Request success/failure diagnostics |
| GET | /diagnostics/providers | No | Provider status snapshot |
| GET | /diagnostics/providers | Yes (if API key configured) | Provider status snapshot |
| POST | /v1/chat/completions | Yes (if API key configured) | OpenAI-compatible chat-completions |

> [!NOTE]
> Authentication is enabled when LLM_ROUTER_API_KEY is non-empty. When enabled, all routes require auth except /health and /diagnostics/providers.
> Authentication is enabled when LLM_ROUTER_API_KEY is non-empty. When enabled, all routes require auth except /health.

### Chat Request Shape

Expand Down Expand Up @@ -157,13 +218,71 @@ Loop controls:

- --prompt <text> initial prompt and loop entry
- --provider <name> provider override
- --model <name> default model override (also available as `LLM_ROUTER_MODEL`)
- --until <marker> completion marker (default: DONE)
- --max-turns <n> loop safety cap (default: 8)
- --loop-mode <basic|agent> loop style
- --loop-mode <basic|agent|react> loop style
- --agent-loop shorthand for agent mode
- --react shorthand for ReAct reasoning mode
- --use-env load .env
- --env-file <path> load a custom dotenv file

## ReAct Mode

ReAct (Reasoning + Acting) mode implements the paradigm from [Yao et al., 2022](https://arxiv.org/abs/2210.03629). The model produces structured **Thought → Action** pairs, and the system executes each action and injects the result as an **Observation** before the next turn.

### Available Actions

| Action | Description | Requirements |
| --------- | -------------------------------- | --------------------------------------- |
| Search[q] | Search for information (stub) | None (future: wire to search tool/API) |
| Lookup[t] | Look up a term in context (stub) | None (future: wire to retrieval backend) |
| Cmd[c] | Execute a shell command | `LLM_ROUTER_TOOL_EXEC_ENABLED=1` |
| Finish[a] | Return final answer and stop loop | None |

> [!NOTE]
> Search and Lookup return stub responses. They are designed as extension points for future tool/API integration.

### CLI Example

```bash
zig build run -- --react --prompt "What is the elevation range of the High Plains?" --provider ollama
```

### API Example

```json
{
"messages": [{ "role": "user", "content": "What is the elevation range of the High Plains?" }],
"loop_mode": "react",
"loop_max_turns": 24,
"tools": [
{ "name": "file_read", "description": "Read a project file" },
{ "name": "file_write", "description": "Write a project file" },
{ "name": "file_search", "description": "Search the repo" }
]
}
```

When `loop_max_turns` is omitted, ReAct defaults to **24 turns** (vs 8 for basic/agent) so smaller models can solve coding tasks step-by-step. Tool-call budget scales with the turn cap. Set `max_context_tokens` (for example `8192`) so long loops compact history between turns.

The server **auto-attaches** the coding tool set for `loop_mode: "react"` (`file_read`, `file_write`, `file_search`, plus `bash` or `cmd` on the host OS). Client-provided tools are preserved; missing defaults are merged in. `tool_choice` defaults to `auto` when omitted. Any HTTP client can use ReAct without sending a `tools` array.

The model will produce output like:

```
Thought 1: I need to search for the High Plains elevation range.
Action 1: Search[High Plains elevation]
```

The system injects:

```
Observation 1: <result from action execution>
```

This continues until the model emits `Action N: Finish[answer]` or the turn budget is exhausted.

## Tools

Registered tool names:
Expand All @@ -172,11 +291,15 @@ Registered tool names:
- utc
- cmd
- bash
- file_read
- file_write
- file_search

Tool behavior summary:

- echo and utc are deterministic debug helpers.
- cmd and bash are guarded command-execution tools.
- file_read, file_write, and file_search are lightweight filesystem helpers for trusted local workflows.
- Command execution is disabled by default and must be explicitly enabled.

```bash
Expand All @@ -188,6 +311,10 @@ Related limits:

- LLM_ROUTER_TOOL_EXEC_TIMEOUT_MS (default: 15000)
- LLM_ROUTER_TOOL_EXEC_MAX_OUTPUT_BYTES (default: 65536)
- LLM_ROUTER_TOOL_EXEC_CONFIRM_REQUIRED (default: true; re-send with `LLM_ROUTER_TOOL_CONFIRM <token>`)
- LLM_ROUTER_TOOL_EXEC_TRUSTED_LOCAL (default: false; allows pipes/chaining with denylist kept)
- LLM_ROUTER_TOOL_OUTPUT_OFFLOAD_BYTES (default: 8192; 0 disables offloading large tool output to disk)
- LLM_ROUTER_MAX_TOOL_CALLS_PER_REQUEST (default: 8)

## Configuration Reference

Expand All @@ -199,11 +326,16 @@ Related limits:
| LLM_ROUTER_PORT | 8081 |
| LLM_ROUTER_DEBUG | 0 |
| LLM_ROUTER_PROVIDER | ollama |
| LLM_ROUTER_MODEL | unset (uses per-provider `*_MODEL`) |
| LLM_ROUTER_INSTANCE_ID | local-instance |
| LLM_ROUTER_API_KEY | empty |
| LLM_ROUTER_REQUEST_TIMEOUT_MS | 30000 |
| LLM_ROUTER_PROVIDER_TIMEOUT_MS | 60000 |
| LLM_ROUTER_LOOP_STREAM_PROGRESS_ENABLED | true |
| LLM_ROUTER_MAX_CONCURRENT_CONNECTIONS | 64 |
| LLM_ROUTER_MAX_REQUEST_BYTES | 1048576 |
| LLM_ROUTER_MAX_HEADER_BYTES | 16384 |
| LLM_ROUTER_DEFAULT_MAX_CONTEXT_TOKENS | unset |

### Session Storage

Expand All @@ -212,14 +344,34 @@ Related limits:
| LLM_ROUTER_SESSION_STORE_PATH | logs/sessions |
| LLM_ROUTER_SESSION_RETENTION_MESSAGES | 24 |

### Workspace Mode (optional, in-memory)

Lightweight Cursor-like mode: same `workspace_id` keeps conversation in RAM until the server exits. File tools resolve under the workspace root instead of `tmp/`. Disabled by default.

| Variable | Default |
| -------------------------- | ------- |
| LLM_ROUTER_WORKSPACE_MODE | 0 |
| LLM_ROUTER_WORKSPACE_ROOT | `.` |

Send `workspace_id` in the JSON body (separate from disk `session_id`):

```bash
export LLM_ROUTER_WORKSPACE_MODE=1
export LLM_ROUTER_WORKSPACE_ROOT=.

curl -s http://127.0.0.1:8081/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"workspace_id":"dev-1","messages":[{"role":"user","content":"Read src/main.zig"}],"tools":[{"name":"file_read","description":"read files"}],"tool_choice":"file_read"}'
```

### Ollama

| Variable | Default |
| --------------------- | ------------------------ |
| OLLAMA_BASE_URL | <http://127.0.0.1:11434> |
| OLLAMA_MODEL | qwen3.5:9b |
| OLLAMA_THINK | 0 |
| OLLAMA_NUM_PREDICT | 128 |
| OLLAMA_THINK | true |
| OLLAMA_NUM_PREDICT | 2048 |
| OLLAMA_TEMPERATURE | 0.7 |
| OLLAMA_REPEAT_PENALTY | 1.05 |

Expand Down Expand Up @@ -290,12 +442,36 @@ Related limits:
-d '{"messages":[{"role":"user","content":"Say hello from zig-coding-agent"}]}'
```

4. Provider Diagnostics
4. Stateful Session Check

```bash
curl -s http://127.0.0.1:8081/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"session_id":"demo-session","messages":[{"role":"user","content":"Remember that my test color is blue."}]}'
```

5. Tool Check

```bash
curl -s http://127.0.0.1:8081/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"What time is it in UTC?"}],"tools":[{"name":"utc","description":"Current UTC time"}],"tool_choice":"auto"}'
```

6. Provider Diagnostics

```bash
curl -s http://127.0.0.1:8081/diagnostics/providers
```

## Short Runbook

- 401 Unauthorized: set `LLM_ROUTER_API_KEY` on the server and send either `X-Api-Key: <key>` or `Authorization: Bearer <key>`.
- 413 request_too_large: lower the prompt size or raise `LLM_ROUTER_MAX_REQUEST_BYTES` for trusted deployments.
- 504 provider_timeout: check provider reachability and `LLM_ROUTER_PROVIDER_TIMEOUT_MS`.
- unknown_tool: request only registered tools listed above.
- provider_not_configured: set the provider API key or switch to a local provider.

## Project Layout

```
Expand All @@ -309,6 +485,7 @@ zig_coding_agent
│ ├── api.zig
│ ├── auth.zig
│ ├── errors.zig
│ ├── mcp.zig
│ ├── session.zig
│ └── tools.zig
├── config.zig
Expand All @@ -326,10 +503,12 @@ zig_coding_agent
│ ├── openai.zig
│ ├── openai_compatible.zig
│ └── openrouter.zig
├── react.zig
├── root.zig
├── tools
│ ├── command_exec.zig
│ ├── echo.zig
│ ├── file_ops.zig
│ └── utc.zig
└── types.zig
```
Loading