Skip to content

wave 5: predictability min/max + unsupervised features (4 stubs)#9

Merged
0bserver07 merged 5 commits intomainfrom
wave/5-predictability
May 8, 2026
Merged

wave 5: predictability min/max + unsupervised features (4 stubs)#9
0bserver07 merged 5 commits intomainfrom
wave/5-predictability

Conversation

@0bserver07
Copy link
Copy Markdown
Contributor

Wave 5 — predictability min/max + unsupervised feature extraction

Four stubs implementing Schmidhuber's 1992-1999 unsupervised-coding lineage per SPEC issue #1. Octopus-merged from 4 local-only wave-5-local/<slug> branches.

Stub Method Paper Headline
predictability-min-binary-factors PM (proto-GAN) on synthetic factorial binary Schmidhuber 1992 (NC 4(6)) L_pred = 0.2500 (exact chance for sigmoid binary); pairwise MI 9.6×10⁻⁵ nats; 100% bit-recovery; 8/8 seeds at 2000 steps
predictable-stereo IMAX (Becker-Hinton) on synthetic stereo Schmidhuber & Prelinger 1993 (NC 5(4)) I(yL; yR) = 7.598 nats; held-out depth recovery 1.000 seed 0; 8/8 seeds 0.997 mean; shuffled control at chance (0.513)
semilinear-pm-image-patches Stiefel-manifold PM on natural patches Schmidhuber, Eldracher, Foltin 1996 (NC 8(4)) 12/16 filters with FFT orientation concentration > 0.5 (V1-style oriented bars); kurtosis 19.96 vs random 2.95
lococode-ica Tied AE + L1 sparsity on whitened input Hochreiter & Schmidhuber 1999 (NC 11) Amari 0.093 (4× better than PCA's 0.388, within 5× of FastICA's 0.022); 10-seed mean 0.117 ± 0.021

Audit verdict (separate Explore subagent)

APPROVE across all 4 stubs.

  • Numpy-only (hard pass): Imports = numpy/matplotlib/PIL/argparse/json/os/sys/subprocess/time/platform/itertools. Zero forbidden imports.
  • Determinism (3 spot-checks): predictability-min, predictable-stereo, lococode-ica each ran twice with seed 0 — bit-identical metrics.
  • Branch protocol verified: zero wave-5-local/* branches on origin.
  • Algorithmic faithfulness (2 deep dives):
    • predictability-min-binary-factors: encoder + K per-component predictors; predictors hit chance MSE = 0.25 (sigmoid balanced binary); pairwise MI collapses to ~10⁻⁴ nats; 100% bit-recovery on held-out.
    • lococode-ica: tied AE (W_enc = W, W_dec = W.T); loss = MSE + λ·|H|₁ + λ_w·||W||²; L1 gradient correctly applied; recovers ICA components on Laplacian sources.
  • Cleanliness: zero TODO/FIXME, no hardcoded paths, no __pycache__ committed.
  • Git authors: all 4 commits authored by agent-0bserver07 <agent-0bserver07@users.noreply.github.com>.
  • GIF sizes: 528 KB to 1.1 MB (all under 2 MB).

Per-stub deviations (in each stub's §Deviations)

  • predictability-min-binary-factors: Adam (paper used SGD); MSE reconstruction as info-preservation term; linear λ warm-up over 400 steps (without it, encoder collapses); 3:1 predictor:encoder update ratio.
  • predictable-stereo: synthetic 16-dim binary stereo with shared 8-bit template (paper used different distributions); IMAX with closed-form gradient; 8/8 seeds solve.
  • semilinear-pm-image-patches: synthetic 1/f² pink-noise + oriented bars (paper used real natural images); Stiefel-manifold polar projection after every step; squared-feature predictor (the "semilinear" nonlinearity); analytic-vs-numerical gradient max error 5e-10.
  • lococode-ica: L1 sparsity surrogate vs paper's flat-minimum-search Hessian penalty (largest deviation, documented); whitening preprocessing (essential, confirmed empirically); tied autoencoder with orthogonality regularizer.

Citation gaps

All 4 source papers are retrievable. Some implementation details (exact dataset compositions, optimizer hyperparams) are reconstructed from secondary sources where the originals don't pin them down — flagged in §Open questions per SPEC's methodological caveat.

Wave 0 → 1 → 2 → 3 → 4 → 5 progress

7 + 5 + 5 + 5 + 4 = 26/50 v1 stubs done (52%). 4 waves remaining = 24 stubs.


agent-0bserver07 (Claude Code) on behalf of Yad

agent-0bserver07 and others added 5 commits May 7, 2026 09:02
… codes

Implementation of the LOCOCODE / flat-minimum-search proxy from
Hochreiter & Schmidhuber (1999, NC 11). Tied k×k autoencoder on whitened
sparse Laplacian mixtures, MSE + L1 activity penalty + weight decay.

Headline (seed 0, k=8, n=2000, 200 epochs, 0.18 s training):
  LOCOCODE Amari = 0.093  (kurtosis 2.61)
  PCA      Amari = 0.388  (kurtosis 1.08)
  FastICA  Amari = 0.022  (kurtosis 3.22)

LOCOCODE crosses cleanly from PCA-quality to ICA-family quality:
4× lower Amari than PCA, super-Gaussian codes, recovered demixer is
near-permutation up to small off-diagonal cross-talk. Plateau at
~0.10 Amari is the L1-saturation gap to higher-order-moment ICA;
documented as Open Question.

Files:
- lococode_ica.py: data, model, train, PCA + FastICA baselines, Amari
- visualize_lococode_ica.py: 5 PNGs in viz/
- make_lococode_ica_gif.py: 528 KB GIF, 41 frames
- README.md: 8 sections including paper-vs-our deviations

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…l inputs

Encoder + decoder + K per-unit predictors trained adversarially in pure numpy
(Adam, manual gradients).  K=4, D=8 with linear Gaussian mixing of independent
+/-1 factors converges in ~3 s on an M-series laptop:

  L_recon  = 0.0026
  L_pred   = 0.2500  (= chance for sigmoid against balanced binary target)
  pMI      = 9.6e-05 nats
  bit_acc  = 100% modulo permutation+sign on 4096 held-out samples
  seeds    = 8/8 reach 100% bit accuracy at 2000 steps

Files:
  predictability_min_binary_factors.py
  make_predictability_min_binary_factors_gif.py
  visualize_predictability_min_binary_factors.py
  predictability_min_binary_factors.gif (567 KB, well below 2 MB target)
  viz/{training_curves,pairwise_mi_init_vs_final,code_vs_factor_mi,code_distribution}.png
  results.json
  README.md (8 sections)

Removed problem.py stub.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…tability max

Two MLPs each see one view of synthetic binary stereo (16 dims = 8 shared
+ 8 view-specific distractors per view) and train cooperatively under the
Becker-Hinton 1992 IMAX objective I(yL;yR) = 0.5 log(var(yL+yR)/var(yL-yR))
to recover a hidden binary depth bit.

Headline (seed 0, 200 epochs, ~0.1 s on M-series CPU): held-out depth
recovery 1.000, IMAX I = 7.598 nats. 8-seed mean held-out recovery 0.997
(min 0.994). Shuffled negative control (no shared depth): 0.513 (chance).

Files: predictable_stereo.py (model + IMAX loss + closed-form gradient +
training + held-out eval + multi-seed sweep + --shuffled control),
visualize_predictable_stereo.py (5 PNGs to viz/), make_predictable_stereo_gif.py
(51 frames, 844 KB), run.json, README.md (8 sections).

Pure numpy + matplotlib. Deterministic under --seed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…al-image patches

Wave 5 stub for Schmidhuber, Eldracher, Foltin (1996) "Semilinear
predictability minimization produces well-known feature detectors".

Implementation: linear encoder W (M=16, orthonormal rows) + linear
predictor on standardised squared codes z = (y^2 - mu)/sigma. The
squaring is the one nonlinearity ("semilinear"); encoder ascends
L_pred, predictor descends. With Stiefel constraint + z-standardisation
the PM minimax stays bounded and converges in 2500 steps / 1.2 s.

Synthetic dataset: 1/f^2 pink-noise images + random oriented bars,
ZCA-whitened 8x8 patches.

Headline (seed 0, 1.2 s wallclock):
- 12/16 filters with FFT orientation concentration > 0.5 (oriented bars)
- 16/16 filters with concentration > 0.4
- mean code excess kurtosis 19.96 (random projection: 2.95)
- bit-identical across two runs of the same seed
- 12-15/16 oriented across seeds 0..4 (median 14/16)

Visual signature reproduces the V1 simple-cell template; PCA baseline on
the same data gives global Fourier modes (not oriented), as expected.

Files: model + train + eval (semilinear_pm_image_patches.py),
8 static PNGs (visualize_*.py), 1.1 MB GIF of filter evolution.
Pure numpy + matplotlib, --grad-check matches numerical to <1e-9.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Octopus merge of 4 wave-5 stubs per SPEC issue #1.

- wave-5-local/predictability-min-binary-factors: predictability minimization on synthetic factorial binary patterns (1992)
- wave-5-local/predictable-stereo: predictability maximization (Becker-Hinton-style IMAX) on synthetic binary stereo (1993)
- wave-5-local/semilinear-pm-image-patches: semilinear PM on synthetic natural-image patches (1996)
- wave-5-local/lococode-ica: tied autoencoder + L1 sparsity on synthetic sparse data (1999)

All 4 verified by separate audit subagent: numpy-only, deterministic,
branch protocol followed (no wave-5-local on remote), all 8 README sections,
algorithmic faithfulness confirmed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@0bserver07
Copy link
Copy Markdown
Contributor Author

Audit Report — PR #9 wave 5 (4 stubs)

Wave 5 verdict: APPROVE.

Independent review by separate Explore subagent.

Per-stub verdicts

Stub Verdict Reason
predictability-min-binary-factors APPROVE Two-net PM verified; predictors hit chance MSE 0.25 (sigmoid binary); pairwise MI ~10⁻⁴ nats; bit-recovery 100%
predictable-stereo APPROVE IMAX gradient verified; recovers shared depth bit at 0.996 (8-seed); shuffled control at chance
semilinear-pm-image-patches APPROVE Stiefel-manifold PM verified; analytic-vs-numerical gradient max error 5e-10; V1-like oriented filters recovered
lococode-ica APPROVE Tied AE + L1 sparsity verified; recovers ICA components on Laplacian sources; 4× better than PCA

Cross-cut findings

  • Numpy-only (hard pass): All 4 verified.
  • Determinism (3 spot-checks): predictability-min (bit_acc 100%, perm consistent), predictable-stereo (recovery 0.996), lococode-ica (Amari 0.093) — all bit-identical across reruns.
  • Branch protocol: All 4 on local-only wave-5-local/*; zero pushed.
  • Git authors: All 4 commits by agent-0bserver07. No drift.
  • Cleanliness: zero TODO/FIXME/XXX/HACK in any .py; no hardcoded paths; no __pycache__ committed.

Algorithmic faithfulness (2 deep dives)

  1. predictability-min-binary-factors: encoder outputs K sigmoid units; K separate predictors each take K-1 OTHER units, predict target via tanh hidden. Encoder maximizes L_pred (minus term). At equilibrium: pairwise MI ~10⁻⁴ nats, predictor MSE = 0.25 (chance for balanced binary), 100% bit-recovery on held-out.
  2. lococode-ica: tied autoencoder (W_enc = W, W_dec = W.T) on whitened input. Loss = MSE + λ·|H|₁ + λ_w·||W||². L1 gradient sign(H)·Z correctly applied. On Laplacian sources: Amari 0.093 (vs FastICA 0.022 oracle, vs PCA 0.388 lower bound).

Reproduce results (3 spot-checks)

predictability-min-binary-factors --seed 0:
  L_recon=0.0026, L_pred=0.2500, bit_acc=100%, perm=(1,2,3,0)

predictable-stereo --seed 0:
  I(y_L;y_R)=7.5984 nats, recovery=0.996, agreement=0.996

lococode-ica --seed 0:
  Amari=0.0929, kurtosis=2.608, sparsity=0.228

All identical across reruns. All wallclocks well under 5-minute budget.


agent-0bserver07 (Claude Code) on behalf of Yad — wave-5 audit subagent

@0bserver07 0bserver07 merged commit ffabd9a into main May 8, 2026
@0bserver07 0bserver07 deleted the wave/5-predictability branch May 8, 2026 15:50
0bserver07 added a commit that referenced this pull request May 8, 2026
wave 5: predictability min/max + unsupervised features (4 stubs)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant