Skip to content

Add markdown golden-snapshot test harness for engine migration#467

Merged
jaredwray merged 3 commits into
mainfrom
claude/markdown-test-harness-tcw9ix
Jun 14, 2026
Merged

Add markdown golden-snapshot test harness for engine migration#467
jaredwray merged 3 commits into
mainfrom
claude/markdown-test-harness-tcw9ix

Conversation

@jaredwray

@jaredwray jaredwray commented Jun 14, 2026

Copy link
Copy Markdown
Owner

Why

We're preparing to migrate writr's markdown engine from JS (unified/remark/rehype) to native Rust. That migration is only safe if we can prove the new engine produces the same HTML as today's engine across a large, diverse body of real markdown. The repo previously had ~25 ad-hoc fixtures and no snapshot testing — not enough to catch regressions across the 12 markdown features writr supports.

This PR adds a golden-snapshot harness: the current JS engine's output is the source of truth. We fetch a diverse corpus, render every example with today's engine, and commit the HTML as goldens. A future Rust engine plugs in behind the same adapter interface and must reproduce the goldens (with a reviewed allowlist for intentional diffs).

What's included

  • Corpus fetcher (test/harness/fetch/) — reproducible, pinned public sources: CommonMark spec, GFM spec, markdown-it fixtures, every public github.com/jaredwray/* repo (they all consume writr), and permissively-licensed docs. Raw payloads are cached and committed for offline replay. 1000 unique documents (incl. 339 from all jaredwray repos), each with provenance (source, license, attribution, sha256) in corpus/manifest.json.
  • Render profiles (profiles.ts) — default, commonmark, gfm-only, no-highlight, no-math, rawhtml, mdx. Failures isolate to a feature, and no-highlight/no-math let the Rust engine pass core contracts before achieving highlight.js/KaTeX parity. CommonMark/GFM examples also render under rawhtml so raw-HTML inputs (correctly stripped to empty under the default rawHtml:false) still get a non-empty golden pinning the passthrough.
  • Per-feature diagnostic suites (diagnostics/) — hand-authored tiny examples for every plugin (gfm tables/alerts/tasklists/strikethrough/autolinks, emoji, toc, slug, highlight, math, mdx, raw html, frontmatter, commonmark core) so a failure pinpoints the exact feature.
  • Pluggable RenderAdapter — current engine is WritrJsAdapter; a future WritrRustAdapter is selected via HARNESS_ENGINE and checked against the same JS-generated goldens. allowlist.json (engine-keyed) records reviewed intentional divergences.
  • Golden generator — forces caching:false, asserts renderSync == render, surfaces genuine engine errors via Writr's emitted error event (a legitimately-empty render is a valid golden), and records resolved plugin versions in versions.json for drift auditing.
  • Vitest runner — reports first-divergence index/line/col with context instead of a giant diff.

Results

  • 2041 goldens generated, 2041 harness tests pass, zero render errors across the full corpus.
  • golden:check is clean (no drift); pnpm build and pnpm test are unaffected (100% coverage) — the harness is excluded from the default run via vitest.harness.config.ts + a vitest.config.ts exclude, mirroring the existing integration-test pattern.

New scripts

pnpm corpus:fetch          # fetch/refresh the corpus (online; caches raw payloads)
pnpm corpus:fetch:offline  # rebuild from committed cache only (no network)
pnpm golden:generate       # render corpus + diagnostics, write goldens + versions.json
pnpm golden:check          # CI gate: verify goldens still match the current engine
pnpm test:harness          # run the Vitest runner against committed goldens

Notes

  • The corpus is exactly 1000 unique documents (the cap). The stratified round-robin prioritises the jaredwray consumer markdown; CommonMark/GFM/markdown-it are overlapping spec suites, so global dedupe collapses byte-identical examples rather than padding with near-duplicates.
  • No credentials are committed: cached API payloads are public repo metadata and the manifest stores only raw CDN URLs (verified). The ghp_ strings present in the corpus are ghp_yourtoken placeholders inside jaredwray's own token documentation.
  • See test/harness/README.md for the full workflow and how to wire the Rust adapter (HARNESS_ENGINE=writr-rust).

https://claude.ai/code/session_01Rj1dsi6MW36qudnPKE2Zx9

Introduce a large-scale regression harness that pins the exact HTML output
of the current markdown engine across a diverse corpus, so the upcoming
JS -> native (Rust) engine migration can be validated for parity.

What it includes:
- A reproducible corpus fetcher (test/harness/fetch) pulling from pinned
  public sources: CommonMark spec, GFM spec, markdown-it fixtures, public
  jaredwray repos, and permissively-licensed docs. Raw payloads are cached
  and committed for offline replay. ~900 unique documents with per-doc
  provenance (source, license, attribution, sha256) in a manifest.
- Named render profiles (default, commonmark, gfm-only, no-highlight,
  no-math, rawhtml, mdx) so failures isolate to a feature. CommonMark/GFM
  examples also render under rawhtml so raw-HTML inputs (stripped to empty
  under the default rawHtml:false) still produce a non-empty golden.
- Hand-authored per-feature diagnostic suites covering every plugin
  (gfm tables/alerts/tasklists, emoji, toc, slug, highlight, math, mdx,
  raw html, frontmatter, commonmark core) for pinpoint failure reporting.
- A pluggable RenderAdapter interface: the current engine is WritrJsAdapter;
  a future WritrRustAdapter plugs in behind HARNESS_ENGINE and is checked
  against the same goldens, with an engine-keyed allowlist for intentional
  diffs.
- A golden generator (golden:generate / golden:check) that forces caching
  off, asserts sync/async parity, surfaces genuine engine errors via the
  emitted error event (legit empty output stays a valid golden), and
  records resolved plugin versions for drift auditing.
- A Vitest runner reporting first-divergence index/line/col with context.

The harness is excluded from the default `pnpm test`/coverage run via a
dedicated vitest.harness.config.ts and a vitest.config.ts exclude, so the
2000+ case suite never slows the normal dev loop. New scripts: corpus:fetch,
corpus:fetch:offline, golden:generate, golden:check, test:harness.

https://claude.ai/code/session_01Rj1dsi6MW36qudnPKE2Zx9
@codecov

codecov Bot commented Jun 14, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (b159168) to head (91d440d).

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #467   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            5         5           
  Lines          518       518           
  Branches       144       144           
=========================================
  Hits           518       518           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive Markdown Golden-Snapshot Harness designed to ensure HTML output consistency across different markdown engines (specifically for an upcoming JS to Rust migration). It adds new package scripts, a detailed README, allowlist management, corpus loading utilities, and a large set of CommonMark spec markdown files to serve as the regression test corpus. There are no review comments to address, so no feedback is provided.

Re-ran the fetcher with an authenticated GitHub token so the jaredwray
source is no longer throttled. It now enumerates all public jaredwray
repos and pulls their markdown (339 unique files after dedupe), bringing
the corpus to the full 1000-document cap. The stratified round-robin
prioritises the jaredwray consumer markdown, so CommonMark trims to fit.

Regenerated all goldens (2041, zero render errors); drift check clean and
the full harness suite passes. No credentials are committed — cached API
payloads are public repo metadata and the manifest stores only raw CDN URLs.

https://claude.ai/code/session_01Rj1dsi6MW36qudnPKE2Zx9
GOOGLE_CLOUD_PROJECT=contrib-dev
GOOGLE_CLOUD_LOCATION=us-central1
GOOGLE_CLOUD_TASK_QUEUE=bids-notification
GOOGLE_CLOUD_TASK_API_TOKEN=12345678987654321

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposed secret in test/harness/corpus/inputs/jaredwray/0148.md - low severity
Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

Reply @AikidoSec ignore: [REASON] to ignore this issue.
More Info

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AikidoSec ignore: False positive. This is a test corpus fixture — markdown faithfully fetched from a public jaredwray repo for the golden-snapshot harness. Line 126 is a documentation placeholder inside a fenced code block (GOOGLE_CLOUD_TASK_API_TOKEN=12345678987654321), not a real credential.


Generated by Claude Code

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Based on your feedback, we ignored this issue because of the following reason:

False positive. This is a test corpus fixture — markdown faithfully fetched from a public jaredwray repo for the golden-snapshot harness. Line 126 is a documentation placeholder inside a fenced code block (GOOGLE_CLOUD_TASK_API_TOKEN=12345678987654321), not a real credential.


Generated by Claude Code

Document that both corpus inputs and golden outputs are checked in (so a
fresh clone can run the harness with no network/generation step), and add a
per-source breakdown of the 1000-document corpus.

https://claude.ai/code/session_01Rj1dsi6MW36qudnPKE2Zx9
@jaredwray jaredwray merged commit fce1459 into main Jun 14, 2026
13 checks passed
@jaredwray jaredwray deleted the claude/markdown-test-harness-tcw9ix branch June 14, 2026 22:38
@jaredwray jaredwray mentioned this pull request Jun 15, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants