wave 5: predictability min/max + unsupervised features (4 stubs) by 0bserver07 · Pull Request #9 · cybertronai/schmidhuber-problems

0bserver07 · 2026-05-07T13:15:50Z

Wave 5 — predictability min/max + unsupervised feature extraction

Four stubs implementing Schmidhuber's 1992-1999 unsupervised-coding lineage per SPEC issue #1. Octopus-merged from 4 local-only wave-5-local/<slug> branches.

Stub	Method	Paper	Headline
`predictability-min-binary-factors`	PM (proto-GAN) on synthetic factorial binary	Schmidhuber 1992 (NC 4(6))	L_pred = 0.2500 (exact chance for sigmoid binary); pairwise MI 9.6×10⁻⁵ nats; 100% bit-recovery; 8/8 seeds at 2000 steps
`predictable-stereo`	IMAX (Becker-Hinton) on synthetic stereo	Schmidhuber & Prelinger 1993 (NC 5(4))	I(yL; yR) = 7.598 nats; held-out depth recovery 1.000 seed 0; 8/8 seeds 0.997 mean; shuffled control at chance (0.513)
`semilinear-pm-image-patches`	Stiefel-manifold PM on natural patches	Schmidhuber, Eldracher, Foltin 1996 (NC 8(4))	12/16 filters with FFT orientation concentration > 0.5 (V1-style oriented bars); kurtosis 19.96 vs random 2.95
`lococode-ica`	Tied AE + L1 sparsity on whitened input	Hochreiter & Schmidhuber 1999 (NC 11)	Amari 0.093 (4× better than PCA's 0.388, within 5× of FastICA's 0.022); 10-seed mean 0.117 ± 0.021

Audit verdict (separate Explore subagent)

APPROVE across all 4 stubs.

Numpy-only (hard pass): Imports = numpy/matplotlib/PIL/argparse/json/os/sys/subprocess/time/platform/itertools. Zero forbidden imports.
Determinism (3 spot-checks): predictability-min, predictable-stereo, lococode-ica each ran twice with seed 0 — bit-identical metrics.
Branch protocol verified: zero wave-5-local/* branches on origin.
Algorithmic faithfulness (2 deep dives):
- predictability-min-binary-factors: encoder + K per-component predictors; predictors hit chance MSE = 0.25 (sigmoid balanced binary); pairwise MI collapses to ~10⁻⁴ nats; 100% bit-recovery on held-out.
- lococode-ica: tied AE (W_enc = W, W_dec = W.T); loss = MSE + λ·|H|₁ + λ_w·||W||²; L1 gradient correctly applied; recovers ICA components on Laplacian sources.
Cleanliness: zero TODO/FIXME, no hardcoded paths, no __pycache__ committed.
Git authors: all 4 commits authored by agent-0bserver07 <agent-0bserver07@users.noreply.github.com>.
GIF sizes: 528 KB to 1.1 MB (all under 2 MB).

Per-stub deviations (in each stub's §Deviations)

predictability-min-binary-factors: Adam (paper used SGD); MSE reconstruction as info-preservation term; linear λ warm-up over 400 steps (without it, encoder collapses); 3:1 predictor:encoder update ratio.
predictable-stereo: synthetic 16-dim binary stereo with shared 8-bit template (paper used different distributions); IMAX with closed-form gradient; 8/8 seeds solve.
semilinear-pm-image-patches: synthetic 1/f² pink-noise + oriented bars (paper used real natural images); Stiefel-manifold polar projection after every step; squared-feature predictor (the "semilinear" nonlinearity); analytic-vs-numerical gradient max error 5e-10.
lococode-ica: L1 sparsity surrogate vs paper's flat-minimum-search Hessian penalty (largest deviation, documented); whitening preprocessing (essential, confirmed empirically); tied autoencoder with orthogonality regularizer.

Citation gaps

All 4 source papers are retrievable. Some implementation details (exact dataset compositions, optimizer hyperparams) are reconstructed from secondary sources where the originals don't pin them down — flagged in §Open questions per SPEC's methodological caveat.

Wave 0 → 1 → 2 → 3 → 4 → 5 progress

7 + 5 + 5 + 5 + 4 = 26/50 v1 stubs done (52%). 4 waves remaining = 24 stubs.

agent-0bserver07 (Claude Code) on behalf of Yad

… codes Implementation of the LOCOCODE / flat-minimum-search proxy from Hochreiter & Schmidhuber (1999, NC 11). Tied k×k autoencoder on whitened sparse Laplacian mixtures, MSE + L1 activity penalty + weight decay. Headline (seed 0, k=8, n=2000, 200 epochs, 0.18 s training): LOCOCODE Amari = 0.093 (kurtosis 2.61) PCA Amari = 0.388 (kurtosis 1.08) FastICA Amari = 0.022 (kurtosis 3.22) LOCOCODE crosses cleanly from PCA-quality to ICA-family quality: 4× lower Amari than PCA, super-Gaussian codes, recovered demixer is near-permutation up to small off-diagonal cross-talk. Plateau at ~0.10 Amari is the L1-saturation gap to higher-order-moment ICA; documented as Open Question. Files: - lococode_ica.py: data, model, train, PCA + FastICA baselines, Amari - visualize_lococode_ica.py: 5 PNGs in viz/ - make_lococode_ica_gif.py: 528 KB GIF, 41 frames - README.md: 8 sections including paper-vs-our deviations Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…l inputs Encoder + decoder + K per-unit predictors trained adversarially in pure numpy (Adam, manual gradients). K=4, D=8 with linear Gaussian mixing of independent +/-1 factors converges in ~3 s on an M-series laptop: L_recon = 0.0026 L_pred = 0.2500 (= chance for sigmoid against balanced binary target) pMI = 9.6e-05 nats bit_acc = 100% modulo permutation+sign on 4096 held-out samples seeds = 8/8 reach 100% bit accuracy at 2000 steps Files: predictability_min_binary_factors.py make_predictability_min_binary_factors_gif.py visualize_predictability_min_binary_factors.py predictability_min_binary_factors.gif (567 KB, well below 2 MB target) viz/{training_curves,pairwise_mi_init_vs_final,code_vs_factor_mi,code_distribution}.png results.json README.md (8 sections) Removed problem.py stub. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…tability max Two MLPs each see one view of synthetic binary stereo (16 dims = 8 shared + 8 view-specific distractors per view) and train cooperatively under the Becker-Hinton 1992 IMAX objective I(yL;yR) = 0.5 log(var(yL+yR)/var(yL-yR)) to recover a hidden binary depth bit. Headline (seed 0, 200 epochs, ~0.1 s on M-series CPU): held-out depth recovery 1.000, IMAX I = 7.598 nats. 8-seed mean held-out recovery 0.997 (min 0.994). Shuffled negative control (no shared depth): 0.513 (chance). Files: predictable_stereo.py (model + IMAX loss + closed-form gradient + training + held-out eval + multi-seed sweep + --shuffled control), visualize_predictable_stereo.py (5 PNGs to viz/), make_predictable_stereo_gif.py (51 frames, 844 KB), run.json, README.md (8 sections). Pure numpy + matplotlib. Deterministic under --seed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…al-image patches Wave 5 stub for Schmidhuber, Eldracher, Foltin (1996) "Semilinear predictability minimization produces well-known feature detectors". Implementation: linear encoder W (M=16, orthonormal rows) + linear predictor on standardised squared codes z = (y^2 - mu)/sigma. The squaring is the one nonlinearity ("semilinear"); encoder ascends L_pred, predictor descends. With Stiefel constraint + z-standardisation the PM minimax stays bounded and converges in 2500 steps / 1.2 s. Synthetic dataset: 1/f^2 pink-noise images + random oriented bars, ZCA-whitened 8x8 patches. Headline (seed 0, 1.2 s wallclock): - 12/16 filters with FFT orientation concentration > 0.5 (oriented bars) - 16/16 filters with concentration > 0.4 - mean code excess kurtosis 19.96 (random projection: 2.95) - bit-identical across two runs of the same seed - 12-15/16 oriented across seeds 0..4 (median 14/16) Visual signature reproduces the V1 simple-cell template; PCA baseline on the same data gives global Fourier modes (not oriented), as expected. Files: model + train + eval (semilinear_pm_image_patches.py), 8 static PNGs (visualize_*.py), 1.1 MB GIF of filter evolution. Pure numpy + matplotlib, --grad-check matches numerical to <1e-9. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Octopus merge of 4 wave-5 stubs per SPEC issue #1. - wave-5-local/predictability-min-binary-factors: predictability minimization on synthetic factorial binary patterns (1992) - wave-5-local/predictable-stereo: predictability maximization (Becker-Hinton-style IMAX) on synthetic binary stereo (1993) - wave-5-local/semilinear-pm-image-patches: semilinear PM on synthetic natural-image patches (1996) - wave-5-local/lococode-ica: tied autoencoder + L1 sparsity on synthetic sparse data (1999) All 4 verified by separate audit subagent: numpy-only, deterministic, branch protocol followed (no wave-5-local on remote), all 8 README sections, algorithmic faithfulness confirmed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

0bserver07 · 2026-05-07T13:16:24Z

Audit Report — PR #9 wave 5 (4 stubs)

Wave 5 verdict: APPROVE.

Independent review by separate Explore subagent.

Per-stub verdicts

Stub	Verdict	Reason
predictability-min-binary-factors	APPROVE	Two-net PM verified; predictors hit chance MSE 0.25 (sigmoid binary); pairwise MI ~10⁻⁴ nats; bit-recovery 100%
predictable-stereo	APPROVE	IMAX gradient verified; recovers shared depth bit at 0.996 (8-seed); shuffled control at chance
semilinear-pm-image-patches	APPROVE	Stiefel-manifold PM verified; analytic-vs-numerical gradient max error 5e-10; V1-like oriented filters recovered
lococode-ica	APPROVE	Tied AE + L1 sparsity verified; recovers ICA components on Laplacian sources; 4× better than PCA

Cross-cut findings

Numpy-only (hard pass): All 4 verified.
Determinism (3 spot-checks): predictability-min (bit_acc 100%, perm consistent), predictable-stereo (recovery 0.996), lococode-ica (Amari 0.093) — all bit-identical across reruns.
Branch protocol: All 4 on local-only wave-5-local/*; zero pushed.
Git authors: All 4 commits by agent-0bserver07. No drift.
Cleanliness: zero TODO/FIXME/XXX/HACK in any .py; no hardcoded paths; no __pycache__ committed.

Algorithmic faithfulness (2 deep dives)

predictability-min-binary-factors: encoder outputs K sigmoid units; K separate predictors each take K-1 OTHER units, predict target via tanh hidden. Encoder maximizes L_pred (minus term). At equilibrium: pairwise MI ~10⁻⁴ nats, predictor MSE = 0.25 (chance for balanced binary), 100% bit-recovery on held-out.
lococode-ica: tied autoencoder (W_enc = W, W_dec = W.T) on whitened input. Loss = MSE + λ·|H|₁ + λ_w·||W||². L1 gradient sign(H)·Z correctly applied. On Laplacian sources: Amari 0.093 (vs FastICA 0.022 oracle, vs PCA 0.388 lower bound).

Reproduce results (3 spot-checks)

predictability-min-binary-factors --seed 0:
  L_recon=0.0026, L_pred=0.2500, bit_acc=100%, perm=(1,2,3,0)

predictable-stereo --seed 0:
  I(y_L;y_R)=7.5984 nats, recovery=0.996, agreement=0.996

lococode-ica --seed 0:
  Amari=0.0929, kurtosis=2.608, sparsity=0.228

All identical across reruns. All wallclocks well under 5-minute budget.

agent-0bserver07 (Claude Code) on behalf of Yad — wave-5 audit subagent

wave 5: predictability min/max + unsupervised features (4 stubs)

agent-0bserver07 and others added 5 commits May 7, 2026 09:02

0bserver07 mentioned this pull request May 7, 2026

Spec: minimum implementation requirements for Schmidhuber-problem stubs (v1) #1

Closed

10 tasks

0bserver07 merged commit ffabd9a into main May 8, 2026

0bserver07 deleted the wave/5-predictability branch May 8, 2026 15:50

0bserver07 added a commit that referenced this pull request May 8, 2026

Merge pull request #9 from cybertronai/wave/5-predictability

0d7c130

wave 5: predictability min/max + unsupervised features (4 stubs)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wave 5: predictability min/max + unsupervised features (4 stubs)#9

wave 5: predictability min/max + unsupervised features (4 stubs)#9
0bserver07 merged 5 commits intomainfrom
wave/5-predictability

0bserver07 commented May 7, 2026

Uh oh!

0bserver07 commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

0bserver07 commented May 7, 2026

Wave 5 — predictability min/max + unsupervised feature extraction

Audit verdict (separate Explore subagent)

Per-stub deviations (in each stub's §Deviations)

Citation gaps

Wave 0 → 1 → 2 → 3 → 4 → 5 progress

Uh oh!

0bserver07 commented May 7, 2026

Audit Report — PR #9 wave 5 (4 stubs)

Per-stub verdicts

Cross-cut findings

Algorithmic faithfulness (2 deep dives)

Reproduce results (3 spot-checks)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant