Skip to content

sys1own/automator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automator — Topologically-Sharded Asynchronous Multi-Agent Concurrency Substrate with Non-Euclidean Quality Optimization

A distributed, mathematically rigorous code-mutation evaluation pipeline that combines hyperbolic Riemannian geometry, asynchronous belief propagation over random spanning-tree ensembles, and multi-level hypergraph partitioning into a single, production-grade concurrency substrate for autonomous software engineering agents.


Table of Contents

  1. High-Level Architecture
  2. Mathematical Foundation
  3. Core Substrate Features
  4. Prerequisites & Getting Started
  5. Repository Architecture
  6. Internal Code Architecture Rules
  7. License

1. High-Level Architecture

The substrate is structured as a six-layer concurrent pipeline. Each layer is independently evolvable and communicates exclusively through well-typed interfaces — there are no implicit shared mutable globals between layers.

Layer Module Responsibility
Concurrency src/concurrency/ Hierarchical intent-mode lock tree; deadlock-free multi-agent write serialization
Prediction src/predictive/ Markovian prefetch daemon; Git-history-derived co-occurrence transition matrix
Semantic Graph Tracking src/semantic/ Symbol table construction; ripple-effect propagation across call sites
Geometric Optimization src/vfs/cow_overlay.py Poincaré ball projection; Ollivier-Ricci / Forman-Ricci curvature flow; chaotic BP relaxation
Core VFS Overlay src/vfs/cow_overlay.py Copy-on-Write staged commit; .vfs_tmp atomic swap; friction-gated resolution
Orchestration src/main.py CLI dispatch; hypergraph shard partitioning (Mt-KaHyPar FFI); out-of-process worker management

The two primary execution engines share the geometric and VFS layers but differ in their top-level orchestration strategy:

┌─────────────────────────────────────────────────────────────┐
│                  src/main.py  CLI Entry Point               │
│                                                             │
│   ┌──────────────────────┐   ┌──────────────────────────┐   │
│   │ MasterAutomation-    │   │ ScalableOrchestration-   │   │
│   │ Controller (--mac)   │   │ Engine         (--soe)   │   │
│   │                      │   │                          │   │
│   │ Single-agent global  │   │ k-way hypergraph shard   │   │
│   │ graph ensemble with  │   │ partition; per-shard     │   │
│   │ STE bandit weighting │   │ out-of-process Robin     │   │
│   └──────────┬───────────┘   │ solver; shared-memory    │   │
│              │               │ boundary register file   │   │
│              │               └──────────┬───────────────┘   │
│              └──────────┬───────────────┘                   │
│                         ▼                                   │
│            GlobalGraphEnsembleSandbox                       │
│            AsynchronousBeliefPropagation                    │
│            ContinuousFlowOptimizer                          │
│            CoW VFS Overlay                                  │
└─────────────────────────────────────────────────────────────┘

2. Mathematical Foundation

2.1 Hyperbolic Coordinate Space & Poincaré Ball Projection

Raw code quality is measured along three independent observation axes extracted by ActionFunctional:

$$\mathbf{q} = (\Delta c,; 1 - \tau,; \lambda) \in \mathbb{R}^3$$

where $\Delta c$ is the cyclomatic complexity delta relative to the repository baseline, $\tau \in [0,1]$ is the type-annotation coverage ratio, and $\lambda$ is the linter warning density (warnings per logical source line).

The scalar friction score is the weighted inner product

$$S(\mathbf{q}) = \alpha ,\Delta c ;+; \beta,(1 - \tau) ;+; \gamma,\lambda, \qquad S \geq 0$$

with default weights $\alpha = 1.0$, $\beta = 10.0$, $\gamma = 5.0$ (tunable via --alpha, --beta, --gamma).

To endow the quality manifold with a geometry that naturally penalises extreme values, CoordinateMetricTensor maps each observation vector through a finite-difference Jacobian pullback onto the open unit Poincaré ball

$$\mathbb{B}^n = {\mathbf{x} \in \mathbb{R}^n \mid |\mathbf{x}| < 1}$$

equipped with the Riemannian metric

$$g_{ij}(\mathbf{x}) = \left(\frac{2}{1 - |\mathbf{x}|^2}\right)^2 \delta_{ij}$$

The pullback metric tensor is evaluated by finite-difference column approximation of the Jacobian $J$ of the normalizing map $\phi : \mathbb{R}^3 \to \mathbb{B}^n$:

$$\mathcal{G} = J^T J, \qquad J_{ij} \approx \frac{\phi(\mathbf{q} + \epsilon, \mathbf{e}_j) - \phi(\mathbf{q} - \epsilon, \mathbf{e}_j)}{2\epsilon}$$

The Riemannian volume element $\sqrt{\det \mathcal{G}}$ is recorded as the LSF (Log-Scale Factor) tensor and attached to each variant node in the global dependency graph. Candidates whose Poincaré-ball embedding lies closer to the boundary (higher curvature, higher friction) are geometrically penalised in the ensemble-averaged marginal computation.


2.2 Prescribed Metric Flow & Edge Surgery Protocol

The global dependency graph $\mathcal{H} = (V, E, w)$ is a weighted directed hypergraph whose nodes are source-file module identifiers and whose hyperedges encode import / symbol-dependency relations. ContinuousFlowOptimizer evolves the edge weight function $w : E \to \mathbb{R}_{&gt;0}$ to remove curvature bottlenecks that signal over-coupled, structurally fragile module boundaries.

Ollivier-Ricci Curvature (ORC)

For each directed edge $(u, v) \in E$, ORC measures the transport cost between the unit-mass probability distributions $\mu_u$ and $\mu_v$ supported on the one-hop neighbourhoods of $u$ and $v$:

$$\kappa_{\mathrm{ORC}}(u,v) = 1 - \frac{W_1(\mu_u,, \mu_v)}{d(u,v)}$$

where $W_1$ is the $L^1$ Wasserstein (Earth Mover's) distance and $d(u,v)$ is the graph distance. Edges with $\kappa_{\mathrm{ORC}} &lt; 0$ indicate structural bottlenecks — geodesics must pass through $(u,v)$, making it a high-coupling, high-risk dependency.

Augmented Forman-Ricci Curvature (AFRC)

The AFRC provides an $O(|E|)$ approximation to ORC without solving a linear programme:

$$\kappa_{\mathrm{AFRC}}(u,v) = w_{uv}!\left(\frac{1}{w_{uv}} + \frac{1}{w_{uv}} - \sum_{e \ni u,, e \neq (u,v)} \frac{w_{uv}}{\sqrt{w_e \cdot w_{uv}}} - \sum_{e \ni v,, e \neq (u,v)} \frac{w_{uv}}{\sqrt{w_e \cdot w_{uv}}}\right)$$

Forward-Euler Flow Step

The weight evolution follows a prescribed Ricci flow step:

$$w_{uv}^{(t+1)} = w_{uv}^{(t)} - h \cdot \kappa(u,v)^{(t)} \cdot w_{uv}^{(t)}$$

where $h &gt; 0$ is the forward-Euler step size (--flow-step-size, default $h = 0.05$). Positive-curvature edges contract (coupling strengthens); negative-curvature edges expand (coupling weakens, distributing load).

Edge Surgery Protocol

When an edge weight falls below the surgery threshold $w_{\min} = 10^{-4}$, the GlobalGraphEnsembleSandbox executes a surgery event:

  1. The edge $(u, v)$ is severed from the main graph.
  2. A virtual relay node $r_{uv}$ is spliced in with Robin boundary self-loops — a self-edge with weight $w_{\mathrm{self}} = \theta \cdot w_{uv}$ where $\theta = 0.8$ is the Robin under-relaxation parameter.
  3. The original endpoints $u, v$ are connected to $r_{uv}$ with half-weight edges, distributing the lost coupling load.

This prevents the graph from degenerating into an infinite collection of isolated nodes while ensuring that structurally over-coupled module pairs are dynamically decoupled.


2.3 Distributed Chaotic Relaxation Flow

Once the metric flow has stabilised, the belief marginals ${b_v}_{v \in V}$ over the low/high friction binary state space are inferred by Asynchronous Loopy Belief Propagation over a Monte-Carlo ensemble of random spanning trees.

Wilson's Loop-Erased Random Walk Spanning Trees

generate_random_spanning_tree samples each spanning tree from the uniform distribution over all spanning trees of the working multigraph using Wilson's algorithm. For each node $v$ not yet in the in-tree set $\mathcal{T}$:

  1. Perform a uniform random walk on the adjacency list until the walk first contacts $\mathcal{T}$.
  2. Any cycle that arises is automatically erased: the pointer next_step[u] is overwritten on every revisit, so the final commitment phase traverses only the loop-erased tail.
  3. Commit the loop-erased path as directed parent→child tree edges.

This guarantees that each tree in the ensemble is an independent sample from the uniform spanning tree measure, yielding an ensemble-averaged BP estimator with variance decaying at the optimal rate

$$\mathrm{Var}!\left[\hat{b}_v\right] = O!\left(\frac{1}{N_{\mathrm{trees}}}\right)$$

rather than the correlation-inflated rate produced by biased DFS-based samplers.

Dobrushin Contraction Invariant

Before launching the asynchronous chaotic relaxation threads, AsynchronousBeliefPropagation constructs the explicit absolute transition matrix $M \in \mathbb{R}^{C \times C}_{\geq 0}$ for all $C$ active directed boundary message channels, where

$$M_{ij} = \left|\frac{\partial g_i^{(t+1)}}{\partial g_j^{(t)}}\right|$$

and $g_i^{(t)}$ is the belief message on channel $i$ at iteration $t$. The Dobrushin spectral radius $K = \rho(M)$ is computed via power iteration:

$$K = \rho(M) = \lim_{k \to \infty} |M^k \mathbf{v}|^{1/k}$$

The system asserts $K &lt; 1.0$ before spawning any worker process. This is the Dobrushin contraction condition — it is a sufficient condition for the fixed-point iteration $\mathbf{g}^{(t+1)} = F(\mathbf{g}^{(t)})$ to be a contraction mapping on the message space, guaranteeing:

  • Unique fixed point: the belief propagation equations have a unique globally consistent solution.
  • Geometric convergence: $|\mathbf{g}^{(t)} - \mathbf{g}^| \leq K^t |\mathbf{g}^{(0)} - \mathbf{g}^|$.
  • Asynchronous safety: even under non-deterministic message ordering (chaotic relaxation), the iterates converge to the unique fixed point because $K &lt; 1$ bounds the worst-case per-step amplification across all possible scheduling interleavings.

If $K \geq 1.0$, the engine raises ValueError and falls back to the raw shard beliefs, preventing asynchronous message feedback from amplifying into a divergent cascade.

Out-of-Process Worker Architecture

The asynchronous domain decomposition layer (ShardWorkerProcess, aliased ShardWorkerThread) runs each shard's Robin-adjusted Gaussian-elimination solver in an isolated child process communicating exclusively through:

  • A multiprocessing.shared_memory.SharedMemory segment owned by the parent BackboneRouter (_BoundaryRegisterFile), carrying packed float64 boundary belief slots.
  • A multiprocessing.Queue carrying picklable _ShardWorkerResult completion envelopes.

Within each child process, the local linear system

$$A_i \mathbf{u}_i = \mathbf{b}_i$$

is solved directly by Gaussian elimination with partial pivoting, where $A_i$ is the regularised Laplacian with Robin transmission boundary modifications:

$$A_i[u,u] = \underbrace{\deg(u) + 0.5}_{\text{reg. Laplacian diagonal}} - \underbrace{(1-\theta), w_{uv}}_{\text{Robin correction}}, \qquad \theta = 0.8$$

Ghost-cell linear extrapolation recovers the estimate of the neighbouring shard state $\hat{u}_{\text{ghost}}$ from the shared-memory register file, accounting for scheduling latency:

$$\hat{u}_{\text{ghost}} = (1-\theta),u_{\text{local}} - \frac{g_{\text{in}}}{w_{uv}}$$

$$g_{\text{out}} = (1-\theta),w_{uv},\hat{u}_{\text{ghost}} - w_{uv},u_{\text{local}}$$

The parent process applies a three-tier escalation protocol on worker joins: cooperative join(timeout)SIGTERM + grace period → SIGKILL.


3. Core Substrate Features

3.1 Hierarchical Intent Locking

LockTreeManager implements a five-mode hierarchical intent lock protocol modelled on the IBM System R lock hierarchy:

Mode Symbol Semantics
Intention-Shared IS Intends to place S locks on descendants
Intention-Exclusive IX Intends to place X locks on descendants
Shared + Intention-Exclusive SIX Holds S on current node; intends X on descendants
Shared S Concurrent reads permitted; no writes
Exclusive X Full exclusive ownership; no concurrent access

Compatibility matrix — a cell is ✓ if both requests may be held concurrently:

IS IX SIX S X
IS
IX
SIX
S
X

Deadlock prevention is enforced by requiring all agents to sort their acquisition sets lexicographically by path before requesting any lock. Since all agents acquire in the same total order, a circular wait cannot form.

ConcurrencyOrchestrator wraps LockTreeManager to provide a single execute_transaction(agent_id, write_files, read_files, update_fn) call that acquires X locks on write targets and S locks on read targets in sorted order, executes update_fn, then releases in reverse order.


3.2 Predictive Context Caching

AsynchronousPrefetchDaemon builds a first-order Markov transition matrix $P \in [0,1]^{F \times F}$ over the workspace file set from Git commit co-occurrence data:

$$P_{ij} = \frac{\text{commits containing both } f_i \text{ and } f_j}{\text{commits containing } f_i}$$

When an agent declares intent to write file $f_i$, the daemon pre-fetches syntactic stubs for all $f_j$ with $P_{ij} &gt; 0.40$. Stubs are generated by UniversalStubber — which uses compiled Tree-sitter grammars for AST-aware extraction where available, falling back to comment-stripping regex patterns otherwise — and stored in the PredictiveContextCache bounded at 120,000 tokens.

The pre-warming step reduces the critical-path latency of the first post-commit workspace analysis by loading adjacent file stubs into the in-process cache before the write lock is even acquired.


3.3 Atomic Copy-on-Write VFS Overlay

GlobalGraphEnsembleSandbox evaluates all variant candidates within a copy-on-write staging arena before committing any bytes to disk.

Each candidate variant is stored as raw UTF-8 bytes in an in-memory buffer indexed by (variant_id, file_path). The commit protocol is:

  1. Friction gate: compute the friction score $S$ for the winning variant. Commit is permitted only if $S &lt; S_{\text{prev}}$ (monotone decrease) or if no prior friction exists.
  2. Atomic swap: write the staged bytes to a .vfs_tmp sibling file, then call os.replace (POSIX-atomic on Linux/macOS; transactional on NTFS) to swap it into the target path.
  3. Rollback on failure: if os.replace raises, the .vfs_tmp artifact is deleted and the original target file is left untouched.

This guarantees that the repository on-disk state is always in a consistent committed state — no partial writes are ever visible to concurrent readers or the version control index.


4. Prerequisites & Getting Started

4.1 System Requirements

Component Minimum Version Notes
Python 3.11+ Required for tomllib, match statements, and multiprocessing.shared_memory stability
OS Linux / macOS / Windows 10+ SharedMemory on Windows requires Python ≥ 3.13 for full stability; Linux is the primary target
RAM 4 GB Per-shard Gaussian-elimination solver is $O(N^3)$ in shard size; default shard count 4
CPU cores 4+ Each ShardWorkerProcess occupies one physical core; more cores = lower wall-clock dispatch time

4.2 Building Native Libraries

Tree-sitter Parser Grammars

The UniversalStubber uses compiled Tree-sitter shared libraries for AST-aware stub extraction. Without them the system falls back to regex patterns (functional but lower fidelity).

# Install the Tree-sitter CLI
pip install tree-sitter

# Clone and compile the Python grammar
git clone https://github.com/tree-sitter/tree-sitter-python
python -c "
from tree_sitter import Language
Language.build_library('parsers/tree-sitter-python.so', ['tree-sitter-python'])
"

# Repeat for other grammars:
#   https://github.com/tree-sitter/tree-sitter-typescript  -> tree-sitter-typescript.so
#   https://github.com/tree-sitter/tree-sitter-go          -> tree-sitter-go.so
#   https://github.com/nickel-lang/tree-sitter-rust        -> tree-sitter-rust.so

Place all four .so files (or .dylib on macOS) in the parsers/ directory at the repository root.

Mt-KaHyPar Parallel Hypergraph Partitioner

The execute_mt_kahypar_partition FFI bridge calls mt_kahypar_partition_hypergraph from libmtkahypar.so to obtain balanced $k$-way partition assignments for the dependency hypergraph. Without it the engine falls back to round-robin assignment (valid but unbalanced).

# Clone and build Mt-KaHyPar
git clone --recursive https://github.com/kahypar/mt-kahypar
cd mt-kahypar
cmake -B build -DCMAKE_BUILD_TYPE=Release -DKAHYPAR_DOWNLOAD_BOOST=ON
cmake --build build --target mtkahypar -j$(nproc)

# Copy the shared library to the repository lib/ directory
cp build/lib/libmtkahypar.so /path/to/automator/lib/

Alternatively, add the build output directory to LD_LIBRARY_PATH:

export LD_LIBRARY_PATH=/path/to/mt-kahypar/build/lib:$LD_LIBRARY_PATH

4.3 Environment Validation

Before first use, run the setup check utility to validate all optional dependencies and create required operational directories:

python src/utils/setup_check.py --workspace .

Expected output on a fully configured system:

INFO      Workspace: /path/to/automator
INFO      ============================================================
INFO      ── Tree-sitter parser libraries ─────────────────────────────
INFO        [OK]      tree-sitter-python.so  (Python)
INFO        [OK]      tree-sitter-typescript.so  (TypeScript)
INFO        [OK]      tree-sitter-go.so  (Go)
INFO        [OK]      tree-sitter-rust.so  (Rust)
INFO      ── Mt-KaHyPar native partitioning library ───────────────────
INFO        [OK]      Resolved: /path/to/automator/lib/libmtkahypar.so
INFO      ── Operational directories ──────────────────────────────────
INFO        [OK]      .repo_cache  (already exists)
INFO        [OK]      context  (already exists)
INFO      ── Summary ──────────────────────────────────────────────────
INFO        [PASS]  Tree-sitter parser libraries
INFO        [PASS]  Mt-KaHyPar native partitioning library
INFO        [PASS]  Operational directories

Exit code 0 = fully operational. Exit code 1 = degraded (optional components absent). Exit code 2 = hard failure (directory creation error).

4.4 CLI Reference

The primary entry point is src/main.py. It must be executed from a directory where the src/ package imports resolve (i.e. with src/ on PYTHONPATH, or from within src/ directly):

export PYTHONPATH=src
python src/main.py --help

Mandatory argument

Argument Type Description
--workspace DIR str Repository root directory to operate on

Engine selection

Argument Values Default Description
--engine mac, soe soe mac = MasterAutomationController (single-agent global graph); soe = ScalableOrchestrationEngine (out-of-process sharded)
--agent-id str agent-cli Correlation tag for structured log lines

Hyperparameters

Argument Type Default Description
--n-partitions N int 4 Number of topological shard partitions (SOE only)
--diversity-scale F float 0.5 Search diversity scale for GlobalGraphEnsembleSandbox
--coupling-strength F float 0.1 Pairwise coupling penalty forwarded to BP and shard construction
--flow-steps N int 10 Global Ricci flow iteration count
--flow-step-size F float 0.05 Forward-Euler step size $h$
--worker-timeout SECS float 60.0 Per-worker join timeout before SIGTERM escalation (SOE only)

ActionFunctional weight overrides

Argument Type Default Description
--alpha F float 1.0 Cyclomatic complexity delta weight $\alpha$
--beta F float 10.0 Type-coverage gap weight $\beta$
--gamma F float 5.0 Linter density weight $\gamma$

Variant matrix input

Argument Description
--variant-file TARGET:VARIANT Colon-separated pair; repeatable. Registers VARIANT file content as one candidate rewrite of TARGET.
--variant-matrix JSON Path to a JSON file encoding {"target": {"variant_id": "content"}}. Merged with --variant-file entries.
--read-files PATH … Additional paths to acquire shared read locks on.

Usage examples

# Run the sharded engine on the workspace with default hyperparameters:
python src/main.py --workspace /repo --engine soe

# Evaluate two explicit candidate rewrites of a single file:
python src/main.py --workspace /repo --engine soe \
    --variant-file src/foo.py:variants/foo_v1.py \
    --variant-file src/foo.py:variants/foo_v2.py

# Load a full variant matrix from JSON with custom friction weights:
python src/main.py --workspace /repo --engine soe \
    --variant-matrix matrix.json \
    --alpha 2.0 --beta 8.0 --gamma 3.0

# Run the macro-state controller with aggressive partitioning:
python src/main.py --workspace /repo --engine mac \
    --n-partitions 8 --coupling-strength 0.25 --flow-steps 20

# Serialize the full codebase for transfer to a reasoning model:
python bundle_transfer_context.py --workspace /repo \
    --output context/transfer_payload.txt

5. Repository Architecture

automator/
├── src/
│   ├── main.py                    # CLI entry point; MAC + SOE engines; FFI bridge
│   ├── analysis/
│   │   └── workspace_analyzer.py  # Annotation scanner; dead-variable detector; cycle checker
│   ├── concurrency/
│   │   ├── lock_manager.py        # IS/IX/SIX/S/X intent lock tree
│   │   └── orchestrator.py        # Transaction coordinator wrapping LockTreeManager
│   ├── predictive/
│   │   ├── cache.py               # Token-bounded in-process stub cache
│   │   └── daemon.py              # Markovian prefetch daemon; Git history ingestion
│   ├── semantic/
│   │   └── analyzer.py            # Symbol table; ripple-effect propagation
│   ├── stubber/
│   │   └── universal.py           # Tree-sitter + regex fallback stub extractor
│   ├── utils/
│   │   └── setup_check.py         # Environment validation; native library probe
│   └── vfs/
│       ├── __init__.py
│       └── cow_overlay.py         # All geometric, BP, and VFS machinery (primary module)
├── parsers/                       # Compiled Tree-sitter .so files (user-supplied)
├── lib/                           # Native shared libraries, e.g. libmtkahypar.so
├── context/                       # Generated context serialization outputs
├── .repo_cache/                   # Predictive cache persistence (auto-created)
├── bundle_repo.py                 # Token-aware context bundler
├── bundle_transfer_context.py     # Full-fidelity XML transfer payload generator
└── README.md                      # This document

6. Internal Code Architecture Rules

These invariants are enforced across the entire codebase and must be preserved in all future modifications:

  1. No physics-themed terminology. All variable names, comments, and documentation must use graph-theoretic, distributed-systems, and linear-algebra vocabulary exclusively. Terms such as "quantum", "wavefunction", "entropy", "temperature", "force field", or "thermodynamic" are strictly prohibited.

  2. Float64 precision throughout. All matrix operations in cow_overlay.py — Laplacian construction, Gaussian elimination, power iteration — must preserve float64 precision. No implicit downcast to float32.

  3. Module-level picklability for IPC. Any function or class that may be transmitted to a child process via multiprocessing under the spawn start method must be defined at module level (not as a closure or lambda). The _run_shard_worker function exemplifies this requirement.

  4. Zero cross-layer globals. Layers communicate only through typed return values and constructor injection. No implicit shared state between AutomationCoordinator, the geometric layer, and the VFS layer.

  5. Strict if __name__ == "__main__": guard. Every executable script must wrap its top-level runner behind this guard to prevent recursive re-entry under the spawn multiprocessing start method on all platforms.

  6. Monotone friction gate. The CoW overlay may only commit a variant to disk if its friction score $S$ is strictly less than the previously committed score $S_{\text{prev}}$. This invariant must never be bypassed even in testing paths.

  7. Dobrushin pre-launch gate. AsynchronousBeliefPropagation.run() must always compute and certify $K &lt; 1.0$ before spawning any async relaxation iteration. Callers that catch ValueError from this gate must fall back to raw shard beliefs and must not retry with a modified tolerance.

  8. UTF-8 with error handling on all file I/O. All file reads must specify encoding="utf-8", errors="replace" or errors="ignore" to prevent binary files from aborting execution loops.


7. License

This project is proprietary. All rights reserved.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages