Add markdown golden-snapshot test harness for engine migration#467
Conversation
Introduce a large-scale regression harness that pins the exact HTML output of the current markdown engine across a diverse corpus, so the upcoming JS -> native (Rust) engine migration can be validated for parity. What it includes: - A reproducible corpus fetcher (test/harness/fetch) pulling from pinned public sources: CommonMark spec, GFM spec, markdown-it fixtures, public jaredwray repos, and permissively-licensed docs. Raw payloads are cached and committed for offline replay. ~900 unique documents with per-doc provenance (source, license, attribution, sha256) in a manifest. - Named render profiles (default, commonmark, gfm-only, no-highlight, no-math, rawhtml, mdx) so failures isolate to a feature. CommonMark/GFM examples also render under rawhtml so raw-HTML inputs (stripped to empty under the default rawHtml:false) still produce a non-empty golden. - Hand-authored per-feature diagnostic suites covering every plugin (gfm tables/alerts/tasklists, emoji, toc, slug, highlight, math, mdx, raw html, frontmatter, commonmark core) for pinpoint failure reporting. - A pluggable RenderAdapter interface: the current engine is WritrJsAdapter; a future WritrRustAdapter plugs in behind HARNESS_ENGINE and is checked against the same goldens, with an engine-keyed allowlist for intentional diffs. - A golden generator (golden:generate / golden:check) that forces caching off, asserts sync/async parity, surfaces genuine engine errors via the emitted error event (legit empty output stays a valid golden), and records resolved plugin versions for drift auditing. - A Vitest runner reporting first-divergence index/line/col with context. The harness is excluded from the default `pnpm test`/coverage run via a dedicated vitest.harness.config.ts and a vitest.config.ts exclude, so the 2000+ case suite never slows the normal dev loop. New scripts: corpus:fetch, corpus:fetch:offline, golden:generate, golden:check, test:harness. https://claude.ai/code/session_01Rj1dsi6MW36qudnPKE2Zx9
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #467 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 5 5
Lines 518 518
Branches 144 144
=========================================
Hits 518 518 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive Markdown Golden-Snapshot Harness designed to ensure HTML output consistency across different markdown engines (specifically for an upcoming JS to Rust migration). It adds new package scripts, a detailed README, allowlist management, corpus loading utilities, and a large set of CommonMark spec markdown files to serve as the regression test corpus. There are no review comments to address, so no feedback is provided.
Re-ran the fetcher with an authenticated GitHub token so the jaredwray source is no longer throttled. It now enumerates all public jaredwray repos and pulls their markdown (339 unique files after dedupe), bringing the corpus to the full 1000-document cap. The stratified round-robin prioritises the jaredwray consumer markdown, so CommonMark trims to fit. Regenerated all goldens (2041, zero render errors); drift check clean and the full harness suite passes. No credentials are committed — cached API payloads are public repo metadata and the manifest stores only raw CDN URLs. https://claude.ai/code/session_01Rj1dsi6MW36qudnPKE2Zx9
| GOOGLE_CLOUD_PROJECT=contrib-dev | ||
| GOOGLE_CLOUD_LOCATION=us-central1 | ||
| GOOGLE_CLOUD_TASK_QUEUE=bids-notification | ||
| GOOGLE_CLOUD_TASK_API_TOKEN=12345678987654321 |
There was a problem hiding this comment.
Exposed secret in test/harness/corpus/inputs/jaredwray/0148.md - low severity
Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More Info
There was a problem hiding this comment.
@AikidoSec ignore: False positive. This is a test corpus fixture — markdown faithfully fetched from a public jaredwray repo for the golden-snapshot harness. Line 126 is a documentation placeholder inside a fenced code block (GOOGLE_CLOUD_TASK_API_TOKEN=12345678987654321), not a real credential.
Generated by Claude Code
There was a problem hiding this comment.
✅ Based on your feedback, we ignored this issue because of the following reason:
False positive. This is a test corpus fixture — markdown faithfully fetched from a public jaredwray repo for the golden-snapshot harness. Line 126 is a documentation placeholder inside a fenced code block (
GOOGLE_CLOUD_TASK_API_TOKEN=12345678987654321), not a real credential.
Generated by Claude Code
Document that both corpus inputs and golden outputs are checked in (so a fresh clone can run the harness with no network/generation step), and add a per-source breakdown of the 1000-document corpus. https://claude.ai/code/session_01Rj1dsi6MW36qudnPKE2Zx9
Why
We're preparing to migrate writr's markdown engine from JS (unified/remark/rehype) to native Rust. That migration is only safe if we can prove the new engine produces the same HTML as today's engine across a large, diverse body of real markdown. The repo previously had ~25 ad-hoc fixtures and no snapshot testing — not enough to catch regressions across the 12 markdown features writr supports.
This PR adds a golden-snapshot harness: the current JS engine's output is the source of truth. We fetch a diverse corpus, render every example with today's engine, and commit the HTML as goldens. A future Rust engine plugs in behind the same adapter interface and must reproduce the goldens (with a reviewed allowlist for intentional diffs).
What's included
test/harness/fetch/) — reproducible, pinned public sources: CommonMark spec, GFM spec, markdown-it fixtures, every publicgithub.com/jaredwray/*repo (they all consume writr), and permissively-licensed docs. Raw payloads are cached and committed for offline replay. 1000 unique documents (incl. 339 from all jaredwray repos), each with provenance (source, license, attribution, sha256) incorpus/manifest.json.profiles.ts) —default,commonmark,gfm-only,no-highlight,no-math,rawhtml,mdx. Failures isolate to a feature, andno-highlight/no-mathlet the Rust engine pass core contracts before achieving highlight.js/KaTeX parity. CommonMark/GFM examples also render underrawhtmlso raw-HTML inputs (correctly stripped to empty under the defaultrawHtml:false) still get a non-empty golden pinning the passthrough.diagnostics/) — hand-authored tiny examples for every plugin (gfm tables/alerts/tasklists/strikethrough/autolinks, emoji, toc, slug, highlight, math, mdx, raw html, frontmatter, commonmark core) so a failure pinpoints the exact feature.RenderAdapter— current engine isWritrJsAdapter; a futureWritrRustAdapteris selected viaHARNESS_ENGINEand checked against the same JS-generated goldens.allowlist.json(engine-keyed) records reviewed intentional divergences.caching:false, assertsrenderSync==render, surfaces genuine engine errors via Writr's emittederrorevent (a legitimately-empty render is a valid golden), and records resolved plugin versions inversions.jsonfor drift auditing.Results
golden:checkis clean (no drift);pnpm buildandpnpm testare unaffected (100% coverage) — the harness is excluded from the default run viavitest.harness.config.ts+ avitest.config.tsexclude, mirroring the existing integration-test pattern.New scripts
Notes
ghp_strings present in the corpus areghp_yourtokenplaceholders inside jaredwray's own token documentation.test/harness/README.mdfor the full workflow and how to wire the Rust adapter (HARNESS_ENGINE=writr-rust).https://claude.ai/code/session_01Rj1dsi6MW36qudnPKE2Zx9