wave 7: LSTM follow-ups (5 stubs)#11
Open
0bserver07 wants to merge 7 commits intomainfrom
Open
Conversation
Eight-class temporal-order task with three embedded {X,Y} markers.
Pure numpy + matplotlib LSTM (1997 NC formulation: input + output gates,
no forget gate, pure constant-error carousel) plus vanilla tanh-RNN
baseline. Both run under Adam BPTT in ≈ 25 s on a CPU laptop.
Results (seed 0):
- LSTM 99.0% final / 100% best (507/512 correct on 8-class validation)
- LSTM hits 95% at step 200 (≈ 6 400 sequences)
- RNN 12.3% (chance = 0.125)
- Multi-seed: 5/5 seeds reach 100% best validation
- Median time-to-95% over 5 seeds: 250 steps (≈ 8 000 sequences)
- Gradcheck max relative error 3.5e-11
Files: temporal_order_4bit.py (model + train + CLI),
visualize_temporal_order_4bit.py (6 PNGs), make_temporal_order_4bit_gif.py
(2x4 cell-state animation, ~1 MB), 8-section README, results.json,
snapshots.npz, viz/.
Companion to wave-6 temporal-order-3bit; extends to 3 markers / 8 classes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… chord progression and on-beat melody Eck & Schmidhuber 2002, "Finding Temporal Structure in Music: Blues Improvisation with LSTM Recurrent Networks" (NNSP / IDSIA-07-02). Stacked LSTM (H1=20 chord layer, H2=24 melody layer) trained next-step on 8 hand-synthesized 12-bar choruses. After 200 epochs (~3s on M-series CPU): - bar-onset chord match 12 / 12 (deterministic chord, T=0.85 melody) - step-level chord match 0.906 - on-beat note rate 0.792 - chord-tone rate 0.877 - teacher-forced chord acc 0.993 - teacher-forced pitch acc 0.372 (entropy floor of stochastic corpus) Manual numpy BPTT through both layers; gradcheck max relative error 1e-5. Files: blues_improvisation.py (model + train + generate + CLI), visualize_blues_improvisation.py (training curves, weight panels, piano rolls), make_blues_improvisation_gif.py (21-frame training-evolution GIF, 0.29 MB), README.md (8 sections), viz/ (4 PNGs). Wave 7 family: LSTM follow-ups.
…stream
Implements the headline contrast from Gers, Schmidhuber, Cummins 2000
(Learning to Forget, NC 12(10)). Same Reber grammar as wave-6
embedded-reber, but strings are concatenated into a single never-ending
stream with no episode reset.
LSTMForget (Vanilla LSTM with forget gate): 5/5 seeds solve, mean
99.7% outer T/P accuracy on a fresh 60-string stream. Cell-state norm
stays bounded (~28). Forget gate learns to drop near 0 at end-of-string
markers so the cell silently resets between strings.
LSTMNoForget (1997 LSTM, no forget gate): 5/5 seeds fail, mean 55%
(chance) outer T/P accuracy. Cell-state norm grows monotonically along
the stream (~295 by step 700) and saturates h_squash, jamming the
long-range outer-T/P signal.
Same training stream, hyperparameters, optimizer (Adam(0.01)), 12
hidden cells, 2000 chunks of 6 embedded-Reber strings each. Pure numpy,
deterministic, ~14s for both nets on M-series CPU.
Files:
- continual_embedded_reber.py LSTMForget + LSTMNoForget + truncated-
BPTT trainer + eval + CLI
- visualize_continual_embedded_reber.py training curves, cell-state
trace, forget gate at 'E',
side-by-side rollout, outer
accuracy by stream position
- make_continual_embedded_reber_gif.py side-by-side animation
- README.md 8 sections incl. paper-vs-impl table,
deviations, open questions
- viz/ 5 PNGs (training, cell trace, forget
gate, rollout, outer-by-position)
- continual_embedded_reber.gif 508 KB
Reproduces: yes (qualitative split forget-vs-no-forget). Wallclock
seed 0: 14 s.
…nsitive a^n b^n c^n (Gers & Schmidhuber 2001)
Pure-numpy peephole LSTM (Gers, Schraudolph & Schmidhuber 2002 cell:
input/forget peepholes from c_{t-1}, output peephole from c_t) trained
with full BPTT on online single-sequence Adam updates.
Two languages, one binary-mask-of-legal-next-symbols criterion:
- a^n b^n (context-free, hidden=2): trained on n=1..10, generalises to
n=1..65 contiguous out of 1..100 tested at seed 1; 3/5 seeds reach
the n=100 test cap. Per-symbol BCE drops to ~0.04. Wallclock 2.8 s.
- a^n b^n c^n (context-sensitive, hidden=3): trained on n=1..10,
generalises to n=1..29 contiguous at seed 1; 5-seed median is 24,
every seed beats the n=10 training range. BCE drops to 1.4e-4.
Wallclock 30.7 s.
Headline picture: cell 0 acts as a clean linear counter on a^n b^n —
charges during a's, discharges during b's, hits the predict-T threshold
at exactly t=2n+1. The GIF animates this counter forming across 12
training checkpoints from random init to solved.
- 14 LSTM weight blocks (Wi, bi, pi, Wf, bf, pf, Wg, bg, Wo, bo, po,
Wy, by) plus 3 peephole vectors. Adam lr=0.01, grad clip 1.0,
bias_i=-1, bias_f=+1, sigmoid output + per-step BCE.
- Analytic BPTT verified vs central differences at 5.66e-6 max
relative error on a 3-cell n=2 anbn instance.
- Bit-deterministic across re-runs at the same seed.
- Both runs together: 35 s on M-series CPU. Single-language CLI
variants run in 3 s (anbn) and 31 s (anbncn).
- 6 PNGs to viz/ (training loss, generalisation bars, generalisation
curve, two cell-state traces, gates) + 111 KB anbn_anbncn.gif.
Files:
- anbn_anbncn.py — dataset, peephole LSTM, BPTT, train, eval, gradcheck, CLI
- visualize_anbn_anbncn.py — 6 static PNGs
- make_anbn_anbncn_gif.py — counter-formation animation
- README.md — 8 sections
- problem.py — removed (replaced by anbn_anbncn.py)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Implements Gers, Schraudolph, Schmidhuber 2002, "Learning Precise
Timing with LSTM Recurrent Networks" (JMLR 3:115-143). Headline task:
Measure-Spike-Distance (MSD) -- two input spikes at t1, t2; network
must fire output spike at t1 + 2*(t2-t1).
Architecture:
Pure-numpy LSTM with optional peephole connections (p_i, p_f from
c_{t-1}, p_o from c_t). Forget gate with bias 1.0 (Gers et al 2000
modern variant). BPTT, Adam, global gradient clip 1.0. Manual
gradient verified by central-differences gradcheck (peep + no-peep,
max relative error ~1.7e-7).
Headline (seed 4, T=150, D in [30,60], H=8, 3000 iters, ~32s):
peephole LSTM : test MSE 0.00073, exact-timing solve rate 0.998
vanilla LSTM : test MSE 0.00240, exact-timing solve rate 0.900
Cell-state heatmap shows one cell developing an analog interval
timer between the two input spikes -- the canonical peephole story.
7-seed sweep documented in README. The dramatic peephole-only
demos (paper claim: vanilla "fails on all three tasks") require
T >> 200 and exceed the 5-min laptop budget; flagged in §Open
questions for v1.5 / v2 follow-up. GTS and PFG sub-tasks also
deferred.
Files (timing-counting-spikes/):
timing_counting_spikes.py (model + train + eval + gradcheck)
visualize_timing_counting_spikes.py (5 PNGs in viz/)
make_timing_counting_spikes_gif.py (392 KB GIF, well under 2 MB)
README.md (8-section)
Octopus merge of 5 wave-7 stubs per SPEC issue #1. - wave-7-local/temporal-order-4bit: 8-class XX/XY/YX/YY (H&S 1997 Exp 6b) - wave-7-local/continual-embedded-reber: forget gate vs no-forget contrast (Gers 2000) - wave-7-local/anbn-anbncn: peephole LSTM on CFL/CSL counters (Gers 2001) - wave-7-local/timing-counting-spikes: peephole LSTM on MSD timing task (Gers 2002) - wave-7-local/blues-improvisation: 2-layer stacked LSTM on synthetic blues (Eck 2002) All 5 verified by separate audit subagent: numpy-only, deterministic, branch protocol followed (no wave-7-local on remote), all 8 README sections. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit caught a leftover NotImplementedError stub. Same issue as wave-6 noise-free-long-lag — fixing on top of the wave PR before merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author
Audit Report — PR #11 wave 7 (5 stubs)Wave 7 verdict: APPROVE (after one cleanup commit). Originally REQUEST-CHANGES due to leftover Per-stub verdicts
Cross-cut findings
Algorithmic faithfulnessAll 5 stubs use the right LSTM variant:
Honest gap-reporting
Reproduce resultsagent-0bserver07 (Claude Code) on behalf of Yad — wave-7 audit subagent |
10 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Wave 7 — LSTM follow-ups (2000–2002)
Five stubs implementing the LSTM follow-up era per SPEC issue #1. Octopus-merged from 5 local-only
wave-7-local/<slug>branches.temporal-order-4bitcontinual-embedded-reberanbn-anbncntiming-counting-spikesblues-improvisationAudit verdict (separate Explore subagent)
APPROVE (after one cleanup commit). Originally REQUEST-CHANGES due to leftover
blues-improvisation/problem.py; fixed in commiteae2229before PR open.wave-7-local/*branches on origin.problem.pystubs left, no__pycache__committed.agent-0bserver07.Per-stub deviations (in each stub's §Deviations)
Wave 0 → 7 progress
7 + 5 + 5 + 5 + 4 + 6 + 5 = 37/50 v1 stubs done (74%). 3 waves remaining = 13 stubs.
agent-0bserver07 (Claude Code) on behalf of Yad