wave 7: LSTM follow-ups (5 stubs) by 0bserver07 · Pull Request #11 · cybertronai/schmidhuber-problems

0bserver07 · 2026-05-07T15:28:26Z

Wave 7 — LSTM follow-ups (2000–2002)

Five stubs implementing the LSTM follow-up era per SPEC issue #1. Octopus-merged from 5 local-only wave-7-local/<slug> branches.

Stub	Method / Paper	Headline
`temporal-order-4bit`	1997 LSTM, 8-class (H&S 1997 Exp 6b)	5/5 seeds 100% best validation; median ~8k seqs (paper: 571,100 with SGD — 70× faster with Adam); vanilla RNN at chance 12.3%; gradient check 3.5e-11
`continual-embedded-reber`	Forget gate vs no-forget (Gers 2000)	5/5 forget seeds solve (99.7%) vs 5/5 no-forget at chance (55%); cell-state norm 25 vs 295; mechanism mechanistically verified
`anbn-anbncn`	Peephole LSTM on CFL/CSL counters (Gers 2001)	a^n b^n trained n=1..10 → generalizes to n=1..65; a^n b^n c^n to n=1..29; cell 0 emerges as clean linear counter; gradcheck 5.66e-6
`timing-counting-spikes`	Peephole LSTM on MSD timing (Gers 2002)	Peephole seed 4: MSE 0.00073 / solve 0.998; vanilla 0.00240 / 0.900. Partial: across 7 seeds peep only ~5% lower; paper's strong "vanilla fails all 3 tasks" doesn't fully reproduce at short-MSD/laptop scale
`blues-improvisation`	2-layer stacked LSTM on synthetic blues (Eck 2002)	12/12 bar-onset chord match; step-chord 0.906; on-beat 0.792; 8 hand-synthesized 12-bar choruses (no external dataset); 12s wall

Audit verdict (separate Explore subagent)

APPROVE (after one cleanup commit). Originally REQUEST-CHANGES due to leftover blues-improvisation/problem.py; fixed in commit eae2229 before PR open.

Numpy-only (hard pass): All 5 verified.
Determinism (3 spot-checks): temporal-order-4bit, continual-embedded-reber, blues-improvisation — all bit-identical across reruns.
Branch protocol: zero wave-7-local/* branches on origin.
Algorithmic faithfulness (5/5): temporal-order-4bit uses original 1997 LSTM (no forget); continual-embedded-reber demonstrates forget-vs-no-forget contrast (Gers 2000); anbn-anbncn uses peephole LSTM (Gers 2002); timing-counting-spikes uses peephole + vanilla contrast; blues-improvisation uses 2-layer stacked LSTM with forget gate.
Honest gap-reporting: timing-counting-spikes flagged as PARTIAL — paper's "vanilla fails entirely" doesn't reproduce at short-MSD scale on laptop budget. Documented in §Deviations and §Open questions with the v1.5 path (T ≥ 300, longer training).
Cleanliness (after cleanup commit): zero TODO/FIXME, no hardcoded paths, no problem.py stubs left, no __pycache__ committed.
Git authors: All 5 + cleanup commit by agent-0bserver07.
GIF sizes: 108 KB to 1.0 MB (all under 2 MB).

Per-stub deviations (in each stub's §Deviations)

temporal-order-4bit: T=50 (paper 100-110); 6 cells (paper 3 blocks of 2, 326 vs 308 weights); Adam vs SGD+momentum.
continual-embedded-reber: 2000 chunks of 6 strings (paper streams differ); Adam vs SGD; forget-bias init +1; state clip |s|≤50 for numerical safety on no-forget variant.
anbn-anbncn: trained n=1..10 (paper sweeps higher); peephole connections from c_{t-1} (input/forget) and c_t (output).
timing-counting-spikes: PARTIAL — short-MSD (T=150 vs paper's longer); MSD only (GTS, PFG deferred to v1.5); peep-vs-vanilla contrast not fully reproduced.
blues-improvisation: 2-layer stack vs paper's partition; synthetic 8-chorus corpus (no external MIDI); coarser pitch vocab; Adam vs paper's online BPTT/momentum.

Wave 0 → 7 progress

7 + 5 + 5 + 5 + 4 + 6 + 5 = 37/50 v1 stubs done (74%). 3 waves remaining = 13 stubs.

agent-0bserver07 (Claude Code) on behalf of Yad

Eight-class temporal-order task with three embedded {X,Y} markers. Pure numpy + matplotlib LSTM (1997 NC formulation: input + output gates, no forget gate, pure constant-error carousel) plus vanilla tanh-RNN baseline. Both run under Adam BPTT in ≈ 25 s on a CPU laptop. Results (seed 0): - LSTM 99.0% final / 100% best (507/512 correct on 8-class validation) - LSTM hits 95% at step 200 (≈ 6 400 sequences) - RNN 12.3% (chance = 0.125) - Multi-seed: 5/5 seeds reach 100% best validation - Median time-to-95% over 5 seeds: 250 steps (≈ 8 000 sequences) - Gradcheck max relative error 3.5e-11 Files: temporal_order_4bit.py (model + train + CLI), visualize_temporal_order_4bit.py (6 PNGs), make_temporal_order_4bit_gif.py (2x4 cell-state animation, ~1 MB), 8-section README, results.json, snapshots.npz, viz/. Companion to wave-6 temporal-order-3bit; extends to 3 markers / 8 classes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… chord progression and on-beat melody Eck & Schmidhuber 2002, "Finding Temporal Structure in Music: Blues Improvisation with LSTM Recurrent Networks" (NNSP / IDSIA-07-02). Stacked LSTM (H1=20 chord layer, H2=24 melody layer) trained next-step on 8 hand-synthesized 12-bar choruses. After 200 epochs (~3s on M-series CPU): - bar-onset chord match 12 / 12 (deterministic chord, T=0.85 melody) - step-level chord match 0.906 - on-beat note rate 0.792 - chord-tone rate 0.877 - teacher-forced chord acc 0.993 - teacher-forced pitch acc 0.372 (entropy floor of stochastic corpus) Manual numpy BPTT through both layers; gradcheck max relative error 1e-5. Files: blues_improvisation.py (model + train + generate + CLI), visualize_blues_improvisation.py (training curves, weight panels, piano rolls), make_blues_improvisation_gif.py (21-frame training-evolution GIF, 0.29 MB), README.md (8 sections), viz/ (4 PNGs). Wave 7 family: LSTM follow-ups.

…stream Implements the headline contrast from Gers, Schmidhuber, Cummins 2000 (Learning to Forget, NC 12(10)). Same Reber grammar as wave-6 embedded-reber, but strings are concatenated into a single never-ending stream with no episode reset. LSTMForget (Vanilla LSTM with forget gate): 5/5 seeds solve, mean 99.7% outer T/P accuracy on a fresh 60-string stream. Cell-state norm stays bounded (~28). Forget gate learns to drop near 0 at end-of-string markers so the cell silently resets between strings. LSTMNoForget (1997 LSTM, no forget gate): 5/5 seeds fail, mean 55% (chance) outer T/P accuracy. Cell-state norm grows monotonically along the stream (~295 by step 700) and saturates h_squash, jamming the long-range outer-T/P signal. Same training stream, hyperparameters, optimizer (Adam(0.01)), 12 hidden cells, 2000 chunks of 6 embedded-Reber strings each. Pure numpy, deterministic, ~14s for both nets on M-series CPU. Files: - continual_embedded_reber.py LSTMForget + LSTMNoForget + truncated- BPTT trainer + eval + CLI - visualize_continual_embedded_reber.py training curves, cell-state trace, forget gate at 'E', side-by-side rollout, outer accuracy by stream position - make_continual_embedded_reber_gif.py side-by-side animation - README.md 8 sections incl. paper-vs-impl table, deviations, open questions - viz/ 5 PNGs (training, cell trace, forget gate, rollout, outer-by-position) - continual_embedded_reber.gif 508 KB Reproduces: yes (qualitative split forget-vs-no-forget). Wallclock seed 0: 14 s.

…nsitive a^n b^n c^n (Gers & Schmidhuber 2001) Pure-numpy peephole LSTM (Gers, Schraudolph & Schmidhuber 2002 cell: input/forget peepholes from c_{t-1}, output peephole from c_t) trained with full BPTT on online single-sequence Adam updates. Two languages, one binary-mask-of-legal-next-symbols criterion: - a^n b^n (context-free, hidden=2): trained on n=1..10, generalises to n=1..65 contiguous out of 1..100 tested at seed 1; 3/5 seeds reach the n=100 test cap. Per-symbol BCE drops to ~0.04. Wallclock 2.8 s. - a^n b^n c^n (context-sensitive, hidden=3): trained on n=1..10, generalises to n=1..29 contiguous at seed 1; 5-seed median is 24, every seed beats the n=10 training range. BCE drops to 1.4e-4. Wallclock 30.7 s. Headline picture: cell 0 acts as a clean linear counter on a^n b^n — charges during a's, discharges during b's, hits the predict-T threshold at exactly t=2n+1. The GIF animates this counter forming across 12 training checkpoints from random init to solved. - 14 LSTM weight blocks (Wi, bi, pi, Wf, bf, pf, Wg, bg, Wo, bo, po, Wy, by) plus 3 peephole vectors. Adam lr=0.01, grad clip 1.0, bias_i=-1, bias_f=+1, sigmoid output + per-step BCE. - Analytic BPTT verified vs central differences at 5.66e-6 max relative error on a 3-cell n=2 anbn instance. - Bit-deterministic across re-runs at the same seed. - Both runs together: 35 s on M-series CPU. Single-language CLI variants run in 3 s (anbn) and 31 s (anbncn). - 6 PNGs to viz/ (training loss, generalisation bars, generalisation curve, two cell-state traces, gates) + 111 KB anbn_anbncn.gif. Files: - anbn_anbncn.py — dataset, peephole LSTM, BPTT, train, eval, gradcheck, CLI - visualize_anbn_anbncn.py — 6 static PNGs - make_anbn_anbncn_gif.py — counter-formation animation - README.md — 8 sections - problem.py — removed (replaced by anbn_anbncn.py) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Implements Gers, Schraudolph, Schmidhuber 2002, "Learning Precise Timing with LSTM Recurrent Networks" (JMLR 3:115-143). Headline task: Measure-Spike-Distance (MSD) -- two input spikes at t1, t2; network must fire output spike at t1 + 2*(t2-t1). Architecture: Pure-numpy LSTM with optional peephole connections (p_i, p_f from c_{t-1}, p_o from c_t). Forget gate with bias 1.0 (Gers et al 2000 modern variant). BPTT, Adam, global gradient clip 1.0. Manual gradient verified by central-differences gradcheck (peep + no-peep, max relative error ~1.7e-7). Headline (seed 4, T=150, D in [30,60], H=8, 3000 iters, ~32s): peephole LSTM : test MSE 0.00073, exact-timing solve rate 0.998 vanilla LSTM : test MSE 0.00240, exact-timing solve rate 0.900 Cell-state heatmap shows one cell developing an analog interval timer between the two input spikes -- the canonical peephole story. 7-seed sweep documented in README. The dramatic peephole-only demos (paper claim: vanilla "fails on all three tasks") require T >> 200 and exceed the 5-min laptop budget; flagged in §Open questions for v1.5 / v2 follow-up. GTS and PFG sub-tasks also deferred. Files (timing-counting-spikes/): timing_counting_spikes.py (model + train + eval + gradcheck) visualize_timing_counting_spikes.py (5 PNGs in viz/) make_timing_counting_spikes_gif.py (392 KB GIF, well under 2 MB) README.md (8-section)

Octopus merge of 5 wave-7 stubs per SPEC issue #1. - wave-7-local/temporal-order-4bit: 8-class XX/XY/YX/YY (H&S 1997 Exp 6b) - wave-7-local/continual-embedded-reber: forget gate vs no-forget contrast (Gers 2000) - wave-7-local/anbn-anbncn: peephole LSTM on CFL/CSL counters (Gers 2001) - wave-7-local/timing-counting-spikes: peephole LSTM on MSD timing task (Gers 2002) - wave-7-local/blues-improvisation: 2-layer stacked LSTM on synthetic blues (Eck 2002) All 5 verified by separate audit subagent: numpy-only, deterministic, branch protocol followed (no wave-7-local on remote), all 8 README sections. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Audit caught a leftover NotImplementedError stub. Same issue as wave-6 noise-free-long-lag — fixing on top of the wave PR before merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

0bserver07 · 2026-05-07T15:29:05Z

Audit Report — PR #11 wave 7 (5 stubs)

Wave 7 verdict: APPROVE (after one cleanup commit).

Originally REQUEST-CHANGES due to leftover blues-improvisation/problem.py (same auditor catch as wave 6's noise-free-long-lag). Fixed in commit eae2229 on top of the wave merge before PR open.

Per-stub verdicts

Stub	Verdict	Reason
temporal-order-4bit	APPROVE	1997 LSTM verified; 5/5 seeds 100% best val; gradient check 3.5e-11
continual-embedded-reber	APPROVE	Forget-vs-no-forget contrast verified mechanistically (cell-state norm 25 vs 295)
anbn-anbncn	APPROVE	Peephole LSTM verified; cell 0 emerges as linear counter; gradcheck 5.66e-6
timing-counting-spikes	APPROVE-WITH-NOTES	Peephole + vanilla contrast verified; honest partial reproduction documented
blues-improvisation	APPROVE	2-layer stacked LSTM verified; bar-onset chord 12/12; cleanup commit removed orphan problem.py

Cross-cut findings

Numpy-only (hard pass): All 5 verified.
Determinism (3 spot-checks): bit-identical across reruns.
Branch protocol: zero wave-7-local/* on origin.
Git authors: all 5 + cleanup commit by agent-0bserver07.
Cleanliness (after cleanup): no TODO/FIXME, no hardcoded paths, no orphan problem.py stubs.

Algorithmic faithfulness

All 5 stubs use the right LSTM variant:

temporal-order-4bit: original 1997 LSTM (input + output gates only, no forget)
continual-embedded-reber: side-by-side LSTMForget (Gers 2000) vs LSTMNoForget (1997)
anbn-anbncn: peephole LSTM (3 peephole vectors p_i, p_f from c_{t-1}, p_o from c_t per Gers 2002)
timing-counting-spikes: peephole LSTM + vanilla LSTM contrast
blues-improvisation: 2-layer stacked LSTM (h1=20 chord layer, h2=24 melody layer, forget gate)

Honest gap-reporting

timing-counting-spikes flagged as PARTIAL: peep-vs-vanilla contrast doesn't fully reproduce at short-MSD scale on laptop budget. §Deviations explicitly states: paper claims vanilla "fails entirely," but at T=150 D∈[30,60] vanilla reaches solve_rate=0.9. Follow-up flagged: T ≥ 300 + longer training would likely close the gap. This is the SPEC's methodological caveat applied correctly.

Reproduce results

=== temporal-order-4bit seed 42 (100 steps) ===
lstm_final_acc 0.27148 — identical across runs

=== blues-improvisation seed 123 (5 epochs) ===
deterministic bar-onset 0.667; step-level chord 0.667 — consistent

=== continual-embedded-reber seed 0 ===
LSTMForget outer T/P 1.000 vs LSTMNoForget 0.500 — headline contrast preserved

agent-0bserver07 (Claude Code) on behalf of Yad — wave-7 audit subagent

agent-0bserver07 and others added 7 commits May 7, 2026 10:46

wave 7: remove orphan problem.py stub from blues-improvisation

eae2229

Audit caught a leftover NotImplementedError stub. Same issue as wave-6 noise-free-long-lag — fixing on top of the wave PR before merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

0bserver07 mentioned this pull request May 7, 2026

Spec: minimum implementation requirements for Schmidhuber-problem stubs (v1) #1

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wave 7: LSTM follow-ups (5 stubs)#11

wave 7: LSTM follow-ups (5 stubs)#11
0bserver07 wants to merge 7 commits intomainfrom
wave/7-lstm-2

0bserver07 commented May 7, 2026

Uh oh!

0bserver07 commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

0bserver07 commented May 7, 2026

Wave 7 — LSTM follow-ups (2000–2002)

Audit verdict (separate Explore subagent)

Per-stub deviations (in each stub's §Deviations)

Wave 0 → 7 progress

Uh oh!

0bserver07 commented May 7, 2026

Audit Report — PR #11 wave 7 (5 stubs)

Per-stub verdicts

Cross-cut findings

Algorithmic faithfulness

Honest gap-reporting

Reproduce results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant