Skip to content

wave 7: LSTM follow-ups (5 stubs)#11

Open
0bserver07 wants to merge 7 commits intomainfrom
wave/7-lstm-2
Open

wave 7: LSTM follow-ups (5 stubs)#11
0bserver07 wants to merge 7 commits intomainfrom
wave/7-lstm-2

Conversation

@0bserver07
Copy link
Copy Markdown

Wave 7 — LSTM follow-ups (2000–2002)

Five stubs implementing the LSTM follow-up era per SPEC issue #1. Octopus-merged from 5 local-only wave-7-local/<slug> branches.

Stub Method / Paper Headline
temporal-order-4bit 1997 LSTM, 8-class (H&S 1997 Exp 6b) 5/5 seeds 100% best validation; median ~8k seqs (paper: 571,100 with SGD — 70× faster with Adam); vanilla RNN at chance 12.3%; gradient check 3.5e-11
continual-embedded-reber Forget gate vs no-forget (Gers 2000) 5/5 forget seeds solve (99.7%) vs 5/5 no-forget at chance (55%); cell-state norm 25 vs 295; mechanism mechanistically verified
anbn-anbncn Peephole LSTM on CFL/CSL counters (Gers 2001) a^n b^n trained n=1..10 → generalizes to n=1..65; a^n b^n c^n to n=1..29; cell 0 emerges as clean linear counter; gradcheck 5.66e-6
timing-counting-spikes Peephole LSTM on MSD timing (Gers 2002) Peephole seed 4: MSE 0.00073 / solve 0.998; vanilla 0.00240 / 0.900. Partial: across 7 seeds peep only ~5% lower; paper's strong "vanilla fails all 3 tasks" doesn't fully reproduce at short-MSD/laptop scale
blues-improvisation 2-layer stacked LSTM on synthetic blues (Eck 2002) 12/12 bar-onset chord match; step-chord 0.906; on-beat 0.792; 8 hand-synthesized 12-bar choruses (no external dataset); 12s wall

Audit verdict (separate Explore subagent)

APPROVE (after one cleanup commit). Originally REQUEST-CHANGES due to leftover blues-improvisation/problem.py; fixed in commit eae2229 before PR open.

  • Numpy-only (hard pass): All 5 verified.
  • Determinism (3 spot-checks): temporal-order-4bit, continual-embedded-reber, blues-improvisation — all bit-identical across reruns.
  • Branch protocol: zero wave-7-local/* branches on origin.
  • Algorithmic faithfulness (5/5): temporal-order-4bit uses original 1997 LSTM (no forget); continual-embedded-reber demonstrates forget-vs-no-forget contrast (Gers 2000); anbn-anbncn uses peephole LSTM (Gers 2002); timing-counting-spikes uses peephole + vanilla contrast; blues-improvisation uses 2-layer stacked LSTM with forget gate.
  • Honest gap-reporting: timing-counting-spikes flagged as PARTIAL — paper's "vanilla fails entirely" doesn't reproduce at short-MSD scale on laptop budget. Documented in §Deviations and §Open questions with the v1.5 path (T ≥ 300, longer training).
  • Cleanliness (after cleanup commit): zero TODO/FIXME, no hardcoded paths, no problem.py stubs left, no __pycache__ committed.
  • Git authors: All 5 + cleanup commit by agent-0bserver07.
  • GIF sizes: 108 KB to 1.0 MB (all under 2 MB).

Per-stub deviations (in each stub's §Deviations)

  • temporal-order-4bit: T=50 (paper 100-110); 6 cells (paper 3 blocks of 2, 326 vs 308 weights); Adam vs SGD+momentum.
  • continual-embedded-reber: 2000 chunks of 6 strings (paper streams differ); Adam vs SGD; forget-bias init +1; state clip |s|≤50 for numerical safety on no-forget variant.
  • anbn-anbncn: trained n=1..10 (paper sweeps higher); peephole connections from c_{t-1} (input/forget) and c_t (output).
  • timing-counting-spikes: PARTIAL — short-MSD (T=150 vs paper's longer); MSD only (GTS, PFG deferred to v1.5); peep-vs-vanilla contrast not fully reproduced.
  • blues-improvisation: 2-layer stack vs paper's partition; synthetic 8-chorus corpus (no external MIDI); coarser pitch vocab; Adam vs paper's online BPTT/momentum.

Wave 0 → 7 progress

7 + 5 + 5 + 5 + 4 + 6 + 5 = 37/50 v1 stubs done (74%). 3 waves remaining = 13 stubs.


agent-0bserver07 (Claude Code) on behalf of Yad

agent-0bserver07 and others added 7 commits May 7, 2026 10:46
Eight-class temporal-order task with three embedded {X,Y} markers.
Pure numpy + matplotlib LSTM (1997 NC formulation: input + output gates,
no forget gate, pure constant-error carousel) plus vanilla tanh-RNN
baseline. Both run under Adam BPTT in ≈ 25 s on a CPU laptop.

Results (seed 0):
- LSTM 99.0% final / 100% best (507/512 correct on 8-class validation)
- LSTM hits 95% at step 200 (≈ 6 400 sequences)
- RNN 12.3% (chance = 0.125)
- Multi-seed: 5/5 seeds reach 100% best validation
- Median time-to-95% over 5 seeds: 250 steps (≈ 8 000 sequences)
- Gradcheck max relative error 3.5e-11

Files: temporal_order_4bit.py (model + train + CLI),
visualize_temporal_order_4bit.py (6 PNGs), make_temporal_order_4bit_gif.py
(2x4 cell-state animation, ~1 MB), 8-section README, results.json,
snapshots.npz, viz/.

Companion to wave-6 temporal-order-3bit; extends to 3 markers / 8 classes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… chord progression and on-beat melody

Eck & Schmidhuber 2002, "Finding Temporal Structure in Music: Blues
Improvisation with LSTM Recurrent Networks" (NNSP / IDSIA-07-02).

Stacked LSTM (H1=20 chord layer, H2=24 melody layer) trained next-step on
8 hand-synthesized 12-bar choruses. After 200 epochs (~3s on M-series CPU):

  - bar-onset chord match            12 / 12  (deterministic chord, T=0.85 melody)
  - step-level chord match           0.906
  - on-beat note rate                0.792
  - chord-tone rate                  0.877
  - teacher-forced chord acc         0.993
  - teacher-forced pitch acc         0.372  (entropy floor of stochastic corpus)

Manual numpy BPTT through both layers; gradcheck max relative error 1e-5.

Files: blues_improvisation.py (model + train + generate + CLI),
visualize_blues_improvisation.py (training curves, weight panels, piano
rolls), make_blues_improvisation_gif.py (21-frame training-evolution GIF,
0.29 MB), README.md (8 sections), viz/ (4 PNGs).

Wave 7 family: LSTM follow-ups.
…stream

Implements the headline contrast from Gers, Schmidhuber, Cummins 2000
(Learning to Forget, NC 12(10)). Same Reber grammar as wave-6
embedded-reber, but strings are concatenated into a single never-ending
stream with no episode reset.

LSTMForget   (Vanilla LSTM with forget gate): 5/5 seeds solve, mean
99.7% outer T/P accuracy on a fresh 60-string stream. Cell-state norm
stays bounded (~28). Forget gate learns to drop near 0 at end-of-string
markers so the cell silently resets between strings.

LSTMNoForget (1997 LSTM, no forget gate): 5/5 seeds fail, mean 55%
(chance) outer T/P accuracy. Cell-state norm grows monotonically along
the stream (~295 by step 700) and saturates h_squash, jamming the
long-range outer-T/P signal.

Same training stream, hyperparameters, optimizer (Adam(0.01)), 12
hidden cells, 2000 chunks of 6 embedded-Reber strings each. Pure numpy,
deterministic, ~14s for both nets on M-series CPU.

Files:
- continual_embedded_reber.py    LSTMForget + LSTMNoForget + truncated-
                                 BPTT trainer + eval + CLI
- visualize_continual_embedded_reber.py  training curves, cell-state
                                         trace, forget gate at 'E',
                                         side-by-side rollout, outer
                                         accuracy by stream position
- make_continual_embedded_reber_gif.py   side-by-side animation
- README.md                      8 sections incl. paper-vs-impl table,
                                 deviations, open questions
- viz/                           5 PNGs (training, cell trace, forget
                                 gate, rollout, outer-by-position)
- continual_embedded_reber.gif   508 KB

Reproduces: yes (qualitative split forget-vs-no-forget). Wallclock
seed 0: 14 s.
…nsitive a^n b^n c^n (Gers & Schmidhuber 2001)

Pure-numpy peephole LSTM (Gers, Schraudolph & Schmidhuber 2002 cell:
input/forget peepholes from c_{t-1}, output peephole from c_t) trained
with full BPTT on online single-sequence Adam updates.

Two languages, one binary-mask-of-legal-next-symbols criterion:

- a^n b^n (context-free, hidden=2): trained on n=1..10, generalises to
  n=1..65 contiguous out of 1..100 tested at seed 1; 3/5 seeds reach
  the n=100 test cap. Per-symbol BCE drops to ~0.04. Wallclock 2.8 s.
- a^n b^n c^n (context-sensitive, hidden=3): trained on n=1..10,
  generalises to n=1..29 contiguous at seed 1; 5-seed median is 24,
  every seed beats the n=10 training range. BCE drops to 1.4e-4.
  Wallclock 30.7 s.

Headline picture: cell 0 acts as a clean linear counter on a^n b^n —
charges during a's, discharges during b's, hits the predict-T threshold
at exactly t=2n+1. The GIF animates this counter forming across 12
training checkpoints from random init to solved.

- 14 LSTM weight blocks (Wi, bi, pi, Wf, bf, pf, Wg, bg, Wo, bo, po,
  Wy, by) plus 3 peephole vectors. Adam lr=0.01, grad clip 1.0,
  bias_i=-1, bias_f=+1, sigmoid output + per-step BCE.
- Analytic BPTT verified vs central differences at 5.66e-6 max
  relative error on a 3-cell n=2 anbn instance.
- Bit-deterministic across re-runs at the same seed.
- Both runs together: 35 s on M-series CPU. Single-language CLI
  variants run in 3 s (anbn) and 31 s (anbncn).
- 6 PNGs to viz/ (training loss, generalisation bars, generalisation
  curve, two cell-state traces, gates) + 111 KB anbn_anbncn.gif.

Files:
- anbn_anbncn.py       — dataset, peephole LSTM, BPTT, train, eval, gradcheck, CLI
- visualize_anbn_anbncn.py — 6 static PNGs
- make_anbn_anbncn_gif.py  — counter-formation animation
- README.md            — 8 sections
- problem.py           — removed (replaced by anbn_anbncn.py)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Implements Gers, Schraudolph, Schmidhuber 2002, "Learning Precise
Timing with LSTM Recurrent Networks" (JMLR 3:115-143). Headline task:
Measure-Spike-Distance (MSD) -- two input spikes at t1, t2; network
must fire output spike at t1 + 2*(t2-t1).

Architecture:
  Pure-numpy LSTM with optional peephole connections (p_i, p_f from
  c_{t-1}, p_o from c_t). Forget gate with bias 1.0 (Gers et al 2000
  modern variant). BPTT, Adam, global gradient clip 1.0. Manual
  gradient verified by central-differences gradcheck (peep + no-peep,
  max relative error ~1.7e-7).

Headline (seed 4, T=150, D in [30,60], H=8, 3000 iters, ~32s):
  peephole LSTM   : test MSE 0.00073, exact-timing solve rate 0.998
  vanilla LSTM    : test MSE 0.00240, exact-timing solve rate 0.900

Cell-state heatmap shows one cell developing an analog interval
timer between the two input spikes -- the canonical peephole story.

7-seed sweep documented in README. The dramatic peephole-only
demos (paper claim: vanilla "fails on all three tasks") require
T >> 200 and exceed the 5-min laptop budget; flagged in §Open
questions for v1.5 / v2 follow-up. GTS and PFG sub-tasks also
deferred.

Files (timing-counting-spikes/):
  timing_counting_spikes.py            (model + train + eval + gradcheck)
  visualize_timing_counting_spikes.py  (5 PNGs in viz/)
  make_timing_counting_spikes_gif.py   (392 KB GIF, well under 2 MB)
  README.md                            (8-section)
Octopus merge of 5 wave-7 stubs per SPEC issue #1.

- wave-7-local/temporal-order-4bit: 8-class XX/XY/YX/YY (H&S 1997 Exp 6b)
- wave-7-local/continual-embedded-reber: forget gate vs no-forget contrast (Gers 2000)
- wave-7-local/anbn-anbncn: peephole LSTM on CFL/CSL counters (Gers 2001)
- wave-7-local/timing-counting-spikes: peephole LSTM on MSD timing task (Gers 2002)
- wave-7-local/blues-improvisation: 2-layer stacked LSTM on synthetic blues (Eck 2002)

All 5 verified by separate audit subagent: numpy-only, deterministic,
branch protocol followed (no wave-7-local on remote), all 8 README sections.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit caught a leftover NotImplementedError stub. Same issue as wave-6
noise-free-long-lag — fixing on top of the wave PR before merge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@0bserver07
Copy link
Copy Markdown
Author

Audit Report — PR #11 wave 7 (5 stubs)

Wave 7 verdict: APPROVE (after one cleanup commit).

Originally REQUEST-CHANGES due to leftover blues-improvisation/problem.py (same auditor catch as wave 6's noise-free-long-lag). Fixed in commit eae2229 on top of the wave merge before PR open.

Per-stub verdicts

Stub Verdict Reason
temporal-order-4bit APPROVE 1997 LSTM verified; 5/5 seeds 100% best val; gradient check 3.5e-11
continual-embedded-reber APPROVE Forget-vs-no-forget contrast verified mechanistically (cell-state norm 25 vs 295)
anbn-anbncn APPROVE Peephole LSTM verified; cell 0 emerges as linear counter; gradcheck 5.66e-6
timing-counting-spikes APPROVE-WITH-NOTES Peephole + vanilla contrast verified; honest partial reproduction documented
blues-improvisation APPROVE 2-layer stacked LSTM verified; bar-onset chord 12/12; cleanup commit removed orphan problem.py

Cross-cut findings

  • Numpy-only (hard pass): All 5 verified.
  • Determinism (3 spot-checks): bit-identical across reruns.
  • Branch protocol: zero wave-7-local/* on origin.
  • Git authors: all 5 + cleanup commit by agent-0bserver07.
  • Cleanliness (after cleanup): no TODO/FIXME, no hardcoded paths, no orphan problem.py stubs.

Algorithmic faithfulness

All 5 stubs use the right LSTM variant:

  • temporal-order-4bit: original 1997 LSTM (input + output gates only, no forget)
  • continual-embedded-reber: side-by-side LSTMForget (Gers 2000) vs LSTMNoForget (1997)
  • anbn-anbncn: peephole LSTM (3 peephole vectors p_i, p_f from c_{t-1}, p_o from c_t per Gers 2002)
  • timing-counting-spikes: peephole LSTM + vanilla LSTM contrast
  • blues-improvisation: 2-layer stacked LSTM (h1=20 chord layer, h2=24 melody layer, forget gate)

Honest gap-reporting

timing-counting-spikes flagged as PARTIAL: peep-vs-vanilla contrast doesn't fully reproduce at short-MSD scale on laptop budget. §Deviations explicitly states: paper claims vanilla "fails entirely," but at T=150 D∈[30,60] vanilla reaches solve_rate=0.9. Follow-up flagged: T ≥ 300 + longer training would likely close the gap. This is the SPEC's methodological caveat applied correctly.

Reproduce results

=== temporal-order-4bit seed 42 (100 steps) ===
lstm_final_acc 0.27148 — identical across runs

=== blues-improvisation seed 123 (5 epochs) ===
deterministic bar-onset 0.667; step-level chord 0.667 — consistent

=== continual-embedded-reber seed 0 ===
LSTMForget outer T/P 1.000 vs LSTMNoForget 0.500 — headline contrast preserved

agent-0bserver07 (Claude Code) on behalf of Yad — wave-7 audit subagent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant