wave 1: random search + universal program search (6 stubs)#4
Open
0bserver07 wants to merge 7 commits intomainfrom
Open
wave 1: random search + universal program search (6 stubs)#40bserver07 wants to merge 7 commits intomainfrom
0bserver07 wants to merge 7 commits intomainfrom
Conversation
…input (Schmidhuber 1995/1997)
LSEARCH on a 6-op register-machine DSL with body executed once per (B = bit,
I = index). Programs ordered by Kt(p) = len(p) + log2(time(p)). Finds the
length-3 program 'im+' (T:=I; T:=T*B; A:=A+T) in 58 evaluations on the very
first run -- the lex-first length-3 program in the DSL that matches all 3
training examples. Induced weight vector matches ground-truth ramp w_i = i
exactly; generalizes to 200/200 held-out random 100-bit inputs.
Wallclock: ~0.001 s on M-series laptop, deterministic across seeds 0-7, 42, 99.
DSL: + (A+=T), * (A*=T), m (T*=B), i (T=I), b (T=B), 1 (T=1). Documented
choice in §Deviations -- original FORTH-like DSL not retrievable; we
reconstructed from OOPS 2003 paper and 2015 Deep Learning survey §6.6.
Files: levin_add_positions.py, README.md (8 sections), visualize +
make_gif scripts, viz/{dsl,search_progress,program_trace,generalization}.png,
levin_add_positions.gif (239 KB, 27 frames). problem.py removed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…able subroutines (Schmidhuber 2002/2004)
Pure-stdlib OOPS implementation that solves Hanoi(n) for n=1..15+ by
length-first Levin enumeration over a 4-token DSL (M, SD, SA, C),
augmented with a frozen subroutine library where each task's discovered
solver becomes the call target of the next task's program.
Headline: at n=3, OOPS discovers the 6-token recursive program
`SD C SD M SA C` (12 bits). The same program then solves Hanoi(n) for
every n>=4 with zero re-search, because `C` automatically rebinds to
whichever subroutine is currently the most recently frozen. The program's
bit-length stays constant while the optimal move count grows as 2**n - 1.
Total wallclock: ~21 ms through n=10, ~300 ms through n=15. Every program
produces an optimal 2**n - 1 move sequence, verified independently by
re-execution with the prefix of frozen subroutines that existed at freeze
time.
DSL: 4 tokens (M = move src->dst, SD = swap dst<->aux, SA = swap
src<->aux, C = call last frozen subroutine with frame save/restore).
Subroutine reuse mechanism: each frozen sub stores a `call_target` index
captured at freeze time, so s_k's `C` token resolves to s_{k-1}, enabling
the recursion. Frame save/restore on `C` is the one piece of interpreter
sugar that lets a single recursive program generalize across all n.
Search is deterministic regardless of seed (Levin enumeration is
deterministic by construction); --seed is wired through and recorded.
Files:
- oops_towers_of_hanoi.py - DSL, interpreter, OOPS loop, verification
- visualize_oops_towers_of_hanoi.py - 3 PNGs (search-cost-vs-n,
disassembled subroutine library, reuse chain graph)
- make_oops_towers_of_hanoi_gif.py - animated GIF showing the recursive
program executing on Hanoi(n=5) with call-stack indicator (824 KB)
- README.md - 8-section spec including DSL definition and reuse mechanism
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(Hochreiter & Schmidhuber 1996) Pure-numpy reproduction of the random-search (RS) result from the H&S 1996 NIPS paper "LSTM can solve hard long time lag problems": a fully-recurrent net with 5 tanh hidden units (42 scalar parameters) sampled iid from U[-1, 1] solves the Bengio-94 two-sequence latch task (T=100 timesteps, first symbol carries the class, 99 distractor noise steps) in 905 trials on seed 0 (0.82 s wallclock). 30/30 seeds solve to 100% test accuracy; median 144 trials, p90 580. No gradient computation — just iid weight sampling and forward-pass scoring. Deviations from paper: weight prior U[-1, 1] instead of U[-100, 100] (sub-saturation regime where the solution weights are interpretable); T=100 instead of T=500 (keeps wallclock <1s); accuracy threshold instead of MSE threshold. v1 numbers are smaller than the paper's reported ~718 trials, flagged in §Open questions per the SPEC's methodological caveat on hard-to-retrieve sources. Files: rs_two_sequence.py (CLI runner), visualize_rs_two_sequence.py (static PNGs: search_curve, weight_dist, rollout), make_rs_two_sequence_gif.py (1.2 MB animation), full 8-section README. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…put (Schmidhuber 1995/1997)
Universal-search ordering by |p| + log_2(t) over an 8-instruction stack
DSL (3 bits/op): PUSH0, PUSH1, ADD, BIT, DUP, SWAP, HERE, LOOP. The
search finds the 5-instruction (15-bit) popcount routine
`PUSH0 HERE BIT ADD LOOP` at Levin round k=24 (runtime budget 512 ops,
popcount needs 402) after enumerating ~770k programs in ~1.0 s on an
M-series laptop CPU. Generalises perfectly: 200/200 on the held-out
test set with random 100-bit strings, from only 3 training examples
(popcounts 25, 50, 75). Same program is found across seeds 0-4 because
Levin enumeration is deterministic in instruction-lex order.
Files:
levin_count_inputs.py - DSL VM + Levin search + train/test eval
visualize_levin_count_inputs.py - 5 static PNGs (DSL table, search
progression, found-program disassembly,
VM trace, generalisation)
make_levin_count_inputs_gif.py - 0.22 MB GIF: search counter -> found
banner -> VM trace on an 8-bit input
README.md - 8-section spec, DSL table, multi-seed
verification, deviations from paper
levin_count_inputs.gif, viz/*.png
Deviations: search target is a popcount program directly (not the
all-ones weight vector for a downstream linear unit as in the paper);
DSL is 8 ops not 13; LSEARCH not Probabilistic Levin Search; max
program length capped at 18 bits for laptop runtime. Algorithmic
content (universal-search ordering) is preserved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (Hochreiter & Schmidhuber 1996) Reproduces the random-search baseline from Hochreiter & Schmidhuber, "LSTM can solve hard long time lag problems," NIPS 9 (1996). A 5-hidden-unit fully-recurrent net with iid uniform[-2, 2] weights is sampled until it classifies a 16-string train set perfectly. Per-seed (seed=0): #1 (a*): 1,343 trials | train 100%, test 100% | 0.16 s #2 ((ab)*): 152 trials | train 100%, test 70.6% | 0.02 s #4 (no aaa): 147,399 trials | train 100%, test 53.1% | 17.0 s Aggregated over 10 seeds: 10/10 solved on every grammar; medians 487 / 588 / 81,703 trials. Within ~3x of H&S 1996's reported 182 / 1,511 / 13,833 for #1 and #2; ~6x for #4 (training-set-composition gap, see §Deviations). Files: rs_tomita.py -- dataset, RNN forward, RS loop. CLI runs all 3. visualize_rs_tomita.py -- search curves, hidden trajectories, weight matrices, per-trial accuracy histograms. make_rs_tomita_gif.py -- 25-frame animation across the 3 grammars. rs_tomita.gif -- 150 KB animation. viz/*.png -- 4 static panels. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(Hochreiter & Schmidhuber 1996)
A small fully-recurrent tanh net (1 input -> 2 hidden -> 1 readout, h_0=0)
is sampled by drawing every weight uniformly from [-30, 30] each trial and
scoring on parity-correct over 2048 random length-N sequences. No gradient
descent, no mutation, no crossover -- pure independent uniform sampling.
Headline (seed=0, N=50): solved in 10,253 trials / 15.3 s wallclock on an
M-series laptop, with 100% accuracy on 4,096 held-out unseen sequences.
Across 5 seeds at N=50 all solve within 40 s; across 10 seeds at N=20 all
solve within 41 s. Paper-scale N=500 also solves (median ~13k trials over
10 seeds, seed=0 in 412 trials / 3.2 s).
Architecture deviation: the seed scaffold mentioned 'A2 without
self-connections' but that constraint produces no parity solver under
random sampling at any N >= 6 / weight scale we tried. Standard
fully-recurrent (diagonal of W_hh allowed nonzero) solves robustly.
Documented in README's Deviations and Open questions sections.
Files:
rs_parity.py - dataset + RNN forward + RS loop + CLI (numpy only)
visualize_rs_parity.py - search curve, trial-score histogram, winning
weight Hinton diagram, hidden-state trajectories
make_rs_parity_gif.py - log-spaced animation of the search progression
rs_parity.gif - 296 KB, well under 2 MB target
viz/*.png - the four static panels
README.md - full 8-section v1 spec
Removed: problem.py stub (NotImplementedError placeholders).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Octopus merge of 6 wave-1 stubs per SPEC issue #1. - impl/rs-two-sequence: random-weight-guessing on Bengio-94 latch - impl/rs-parity: random-weight-guessing on N-bit sequence parity - impl/rs-tomita: random-weight-guessing on Tomita grammars #1/#2/#4 - impl/levin-count-inputs: Levin search for popcount on 100 bits - impl/levin-add-positions: Levin search for index-sum on 100 bits - impl/oops-towers-of-hanoi: OOPS with subroutine reuse on Towers of Hanoi All 6 verified by separate audit subagent: numpy-only, deterministic, no hardcoded paths, all 8 README sections present. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author
Audit Report — PR #4 wave 1 (6 stubs)Wave 1 verdict: APPROVE across all 6 stubs. Independent technical review by separate Explore subagent. Mirrors the wave-0 audit pattern: SPEC compliance check, numpy-only constraint, determinism, algorithmic faithfulness, gap-reporting honesty, cross-cut cleanliness. Per-stub verdicts
Cross-cut findings
Levin-count-inputs framing recommendationKEEP as-is. Teammate flagged a framing deviation (search for "program emits popcount" rather than paper's "program emits weight vector for downstream linear unit"). Algorithmically identical, more direct evaluation. Honestly documented in §Deviations and §Open questions. Verdict: keep for v1, leave the paper-framing comparison as a §Open questions item. Reproduce results (3 spot-checks)What I couldn't verify
agent-0bserver07 (Claude Code) on behalf of Yad — wave-1 audit subagent |
10 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Wave 1 — random search + universal program search
Six stubs implementing Schmidhuber-lineage search-based methods (no gradient descent in any of them) per SPEC issue #1. Octopus-merge of 6 per-stub branches; one PR per wave per the SPEC.
rs-two-sequencers-parityrs-tomitalevin-count-inputsPUSH0 HERE BIT ADD LOOP), 770k programs in 1.0s, 200/200 generalizelevin-add-positionsim+), 58 evaluations, 200/200 generalize, 0.34soops-towers-of-hanoiAudit verdict (separate Explore subagent)
APPROVE across all 6 stubs.
numpy,matplotlib,PIL/imageio, stdlib. Zero forbidden imports (no torch / scipy / gym / sklearn / pandas / jax / tensorflow).<slug>.py,README.md,make_<slug>_gif.py,visualize_<slug>.py,<slug>.gif(largest 1.2 MB, all under 2 MB cap),viz/with 3-5 PNGs each. Allproblem.pystubs removed.agent-0bserver07 <agent-0bserver07@users.noreply.github.com>.Per-stub deviations (all documented in each stub's §Deviations)
Citation gaps (tracked in each stub's §Open questions)
The wave reconstructs from secondary sources where original technical reports aren't publicly retrievable:
This matches SPEC's methodological caveat: where primary sources are unretrievable, reconstruct from corroborated secondary sources and flag in §Open questions.
Acceptance checklist (per SPEC, applied to each stub)
All 60 boxes (10 per stub × 6 stubs) pass. Verified by audit subagent.
What's deferred
Wave 0 → wave 1 → wave 2 readiness
Wave 0 (
nbb-xor, PR #2) sanity-validated the pipeline. Wave 1 (this PR, 6 stubs) confirms the pattern scales: 6 teammates dispatched in parallel, all 6 reported back within ~90 min, audit clean. On merge, wave 2 (5 stubs: nbb-moving-light, flip-flop, pole-balance-non-markov, pole-balance-markov-vac, saccadic-target-detection) ready to dispatch.agent-0bserver07 (Claude Code) on behalf of Yad