feat(core): hf-id write-back to disk + serve-time surfacing (R7, Tasks 1-2)#1289
Conversation
d3fd740 to
0886829
Compare
2cb29b1 to
fa6996c
Compare
0886829 to
03dcbaa
Compare
fa6996c to
c125ecc
Compare
03dcbaa to
6160c9e
Compare
6160c9e to
a067aa9
Compare
c125ecc to
7d1bf83
Compare
miguel-heygen
left a comment
There was a problem hiding this comment.
Solid Tasks 1-2 implementation. TOCTOU guard is clever and the attribute-count-based change detection (added in #1292) is the right call over string equality since linkedom normalises quote style. A couple of things worth eyeballing:
P2 — export { ensureHfIds } re-export in hfIdPersist.ts is an unnecessary indirection
preview.ts imports ensureHfIds from hfIdPersist.js but ensureHfIds lives in parsers/hfIds.js. Re-exporting it through hfIdPersist creates an invisible dependency chain: if hfIdPersist is tree-shaken or refactored, callers silently lose the function. Import it directly in preview.ts.
P2 — TOCTOU guard's correctness depends on an undocumented invariant
const current = readFileSync(filePath, "utf-8");
if (current === html) { writeFileSync(...) }This works correctly only if html is the raw on-disk content at the time of the last read (not transformed/constructed HTML). If a caller ever passes constructed HTML, current === html will always be false and writes will silently never happen. Not a bug today (all call sites pass diskMain.html), but the invariant should be in a JSDoc comment on the function signature — otherwise the next caller will be confused by spurious write-skips.
P3 — double ensureHfIds on the non-bundled path
When adapter.bundle() returns nothing, bundled = normalizedDisk (already tagged), then ensureHfIds(await transformPreviewHtml(bundled, ...)) runs again. Idempotent, but one of these can be dropped. Addressed implicitly in #1292's catch-path refactor but the try path still carries both.
Overall: ✅ on the feature.
james-russo-rames-d-jusso
left a comment
There was a problem hiding this comment.
Solid write-back path. The split between normalizeHfIds (pure, returns {html, changed}) and persistHfIdsIfNeeded (side-effecting filesystem write) is the right shape — one pure function, one effecting function on top, both small and testable.
The wire-up in preview.ts is clean: persistHfIdsIfNeeded runs before adapter.bundle(), so by the time the bundler reads disk, the source file already carries data-hf-id. The bundle path then re-runs ensureHfIds after transformPreviewHtml as a safety net (idempotent — already-tagged elements are skipped, per hfIds.ts:51 if (el.getAttribute("data-hf-id")) continue). Good defense in depth.
Walked the two new hfIds.test.ts tests (L80-95):
pinned id survives text edit— write-back commitsdata-hf-id="hf-XXXX"to disk; subsequent text edit doesn't touch the attribute; re-parse keeps the same id becauseensureHfIdsskips pinned elements. ✅pinned id survives attribute edit— same shape, modifies class. ✅
The collision-tiebreak doc comment added to hfIds.ts:57-69 is the right kind of WHY — explains the positional-but-safe-post-persist behavior so a future reader doesn't try to "fix" the order-dependence and break the contract. (Side note: I see #1292 replaces the attribute-edit test with a cross-document-stability test — that's a smart substitution, the attr-edit and text-edit tests pinned the same guarantee.)
Blockers
[changed flag uses string equality on ensureHfIds output — spurious writes on every request] At hfIdPersist.ts:9:
export function normalizeHfIds(html: string): { html: string; changed: boolean } {
const normalized = ensureHfIds(html);
return { html: normalized, changed: normalized !== html };
}ensureHfIds parses with linkedom and re-serializes. The serialization can normalize attribute quoting (single → double), whitespace inside tags, attribute ordering, self-closing slash, and entity escaping — even when no ids were minted. So normalized !== html returns true on the very first served HTML that uses non-default formatting, even when the file is already fully tagged.
Concrete trigger: an HTML file the user hand-authored with <div data-hf-id='hf-ab12'> (single quotes). linkedom serializes as <div data-hf-id="hf-ab12">. changed=true. writeFileSync overwrites the user's file with normalized formatting every request. That's a silent disk write on every preview load, plus it stomps the user's stylistic choices.
This bug IS fixed in #1292 — switches to attribute-count comparison:
const idsBefore = (html.match(/\bdata-hf-id=/g) ?? []).length;
const idsAfter = (normalized.match(/\bdata-hf-id=/g) ?? []).length;
if (idsAfter > idsBefore) { ... }So if the stack lands atomically (Graphite MQ picking up all 4 together), this resolves itself. But if #1289 lands ahead of #1292, prod sees the spurious-write bug. Two paths to close:
- Cherry-pick #1292's count-based detection into #1289. Smallest diff; this PR ships correct.
- Add a "DO NOT LAND BEFORE #1292" banner to the PR description so the MQ doesn't pick it up alone.
(1) is cleaner. The fix in #1292 is 4 lines.
[No concurrent-modification guard on the write] Same area. The current writeFileSync(filePath, normalized, "utf-8") blindly stomps the file. If the user is concurrently saving (their editor and the preview-server racing), the server can overwrite the user's just-saved content with a stale (pre-edit) version that's been hf-id-tagged.
Repro window: user types in their editor → editor save fires at T → preview server already read the file at T-50ms → server's persistHfIdsIfNeeded writes back the T-50ms snapshot at T+10ms. User's edit is lost.
#1292 ALSO fixes this — re-reads disk and only writes if disk content equals the input html:
const current = readFileSync(filePath, "utf-8");
if (current === html) {
writeFileSync(filePath, normalized, "utf-8");
}Same merge-coupling concern as the first blocker — should land in this PR (or with explicit MQ ordering guarantee). The TOCTOU guard is 4 more lines.
Concerns
[Bundle-vs-disk id divergence isn't tested] The preview route runs persistHfIdsIfNeeded on the source file, then calls adapter.bundle(project.dir). If adapter.bundle():
- Reads disk freshly → it sees the just-written hf-ids → bundle output preserves them → serve-time
ensureHfIdsis idempotent → ids on the wire match ids on disk. ✅ - Reads from a cached snapshot taken before the write → bundle output has NO ids → serve-time
ensureHfIdsmints NEW ids on the bundle output → ids on the wire DON'T match disk. ✗
The second case is plausible if the bundler does any caching (Vite/Rollup-style). When the studio drag-to-edit feature later sends a moveElement patch keyed by a wire-time id, the patch fails to apply because the disk file has different ids.
I don't know enough about adapter.bundle() internals to say which case fires. The two new preview.test.ts integration tests (L323-355) exercise the "no bundle, falls back to disk" path — the bundle-success path's id-stability is not tested.
Adding one test that injects a stub adapter.bundle() returning untagged HTML and verifies the served ids match the disk-write ids (or — more defensible — fails loudly when they would diverge) closes this. Otherwise it's a latent footgun for the R7 drag-to-edit feature.
[persistHfIdsIfNeeded is in studio-api/helpers but operates on disk] Architectural — the rest of studio-api/helpers/* is request-shape and bundling code, all in-memory string transforms. This is the only one that does fs.writeFileSync. A reader looking for "what touches the filesystem in studio-api" might miss it.
Two paths:
- Keep it where it is and add a comment in the file header noting "this module performs disk writes — usage requires per-project mutex if concurrent serves are possible."
- Move it to
studio-api/disk/or similar disk-side module location.
Not blocking; just worth picking before this convention takes root.
[Catch block in preview.ts re-uses normalizedDisk from before the try] L237-256:
const normalizedDisk = diskMain ? persistHfIdsIfNeeded(...) : null;
try {
let bundled = await adapter.bundle(...);
...
} catch {
if (diskMain && normalizedDisk !== null) {
return c.html(
injectStudioPreviewAugmentations(
ensureHfIds(await transformPreviewHtml(normalizedDisk, ...)),
...
),
...
);
}
}If adapter.bundle() failed because, say, the user is mid-save and the file is being rewritten by their editor, normalizedDisk is a SNAPSHOT from before the bundle attempt — potentially stale by the time we fall back. Serving stale content with hf-ids isn't catastrophic (the next request reads fresh), but the fallback should probably re-read disk.
#1292 fixes this too — the catch block in #1292 does resolveProjectMainHtml(project.dir, project.id) fresh and then re-runs persistHfIdsIfNeeded on the fresh content. Worth pulling in.
Nits
[Silent catch {} on writeFileSync failure] L17:
try {
writeFileSync(filePath, normalized, "utf-8");
} catch {
// non-fatal — serve with ids even if persist fails
}The comment explains the WHY (non-fatal) but a console.warn or whatever logger is used in this code path would help debug "why isn't write-back happening for this user." Silent failure of disk writes is the kind of thing that surfaces as a confusing support ticket weeks later. Cheap to add a log line. (Or hand a logger adapter through the function signature if there's a project convention for that.)
[Bracket the bytes you compare] Both idsBefore and idsAfter regex matches in #1292 use \bdata-hf-id= — that's the right shape, no need to change. Mentioning for completeness — \b ensures we don't double-count something like before-data-hf-id=.
Questions
[Why persistHfIdsIfNeeded returns the normalized HTML, not the changed bool?] Current signature:
export function persistHfIdsIfNeeded(filePath: string, html: string): stringCaller (preview.ts) always uses the return value as the served HTML. Fine, but the function name implies "this persists conditionally" — a more descriptive name like normalizeAndPersist or tagAndServeFromDisk would tell callers "I serve you the normalized HTML even when I don't write back." Minor.
What I didn't verify
- That
adapter.bundle()reads disk freshly (rather than from a cached snapshot) — this determines whether the bundle-vs-disk id divergence Concern is real or theoretical. Worth a quick check of the bundler implementation. - That existing
preview-regressionand integration tests still pass with the newensureHfIds-on-every-transform path. The "preview-regression" check in CI is currently green on this PR, so probably OK, but I didn't trace the test fixtures. - That
data-hf-rootelements correctly get hf-id minted byensureHfIds(so the stage root, if it's tagged, gets a stable id). Probably true given the impl walksbody.querySelectorAll("*"), but I didn't verify against andata-hf-rootfixture.
Verdict
Right shape, blocked on two correctness gaps that #1292 already addresses. If the stack lands atomically through MQ, this is ready. If individual PRs can land alone, fold #1292's hfIdPersist.ts fixes into this PR — both are ~4-line patches. The bundle-vs-disk id divergence Concern is the only thing I'd add testing for either way.
— Review by Rames D Jusso
miguel-heygen
left a comment
There was a problem hiding this comment.
Building on Rames's review.
Rames identified two blockers I independently confirm: the string-equality change detection triggers spurious writes on linkedom reserialization, and there's no TOCTOU guard on concurrent user saves. Both are fixed in #1292. The question is merge-ordering.
Amplifying Rames's blocker: the spurious-write bug has a concrete user-visible symptom. If the user hand-authored any single-quoted attribute (<div class='wrapper'>) in a file served by the preview route, the server overwrites their formatting on every request — including replacing their single-quoting convention project-wide. Linkedom normalizes to double quotes. This isn't cosmetic: if the file is version-controlled and the user is committing, they'll see dirty diffs on every preview load. That's the kind of thing that erodes trust in the tool fast.
If the stack is guaranteed to land atomically through Graphite MQ: no action needed here.
If not: fold #1292's 8-line count-based detection + TOCTOU guard into this PR before it can land alone. Those two patches are isolated enough to cherry-pick cleanly.
Escalating Rames's "bundle-vs-disk id-stability" concern. The preview route calls persistHfIdsIfNeeded then adapter.bundle(). If the bundler's Vite/Rollup layer has any module cache (it almost certainly does for re-serves), it reads a snapshot from before the write and produces bundle output with no data-hf-id. Serve-time ensureHfIds then mints fresh ids on that un-tagged bundle HTML. Those freshly-minted ids won't match the disk-pinned ids. The first drag-to-edit gesture from studio keyed off the wire-time id will fail to find the element on disk. This is the most impactful latent failure in the stack — the drag-to-edit round-trip is R7's whole point. The two new integration tests in preview.test.ts test the "no-bundle fallback" path, not the bundle-success path with a warm bundler cache. Add one test for that path.
james-russo-rames-d-jusso
left a comment
There was a problem hiding this comment.
Round 2 — HEAD is unchanged at a067aa9 since my round-1 review. The fixes for the two correctness gaps I flagged (linkedom-reserialization spurious writes; concurrent-save TOCTOU race) continue to live only in #1292's 2af2228 fix(core): address R7 code-review findings (C1–C14, P6–P7) commit:
- Count-based change detection (
idsAfter > idsBefore). - Re-read-and-compare TOCTOU guard.
- Plus the bundle-vs-disk id-stability regression test I had flagged as the untested architectural risk — added in
#1292'sd916901atpreview.test.ts:363-391. Pins the content-hash invariant by stubbingadapter.bundle()to return untagged source HTML and assertingservedIds === diskIds. Closes the latent footgun.
So the round-1 blockers on this PR are still merge-coupled to #1292. Stack must land atomically through Graphite MQ for prod to be safe — same posture as round 1.
(One CI nit: the preview-regression FAILURE showing in gh pr checks is from workflow run 27183049880 at 04:03:57 and was superseded by a SUCCESS run at 04:06:04. Same workflow, concurrency-cancellation hangover. Not a real failure.)
My round-1 review at this SHA still stands.
— Review by Rames D Jusso
jrusso1020
left a comment
There was a problem hiding this comment.
Approved on behalf of James.
…review) Addresses Rames' review on #1270: clarifies that a pre-R1 clip authored with id="my-title" round-trips as data-hf-id="my-title" (non-hf-shaped but stable, exact-match) by design — targeting uses exact [data-hf-id="…"] match and does not require the hf- shape; legacy values re-mint only at the R7 write-back. Not a bug. Comment-only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The original comment said legacy data-hf-id values "are re-minted only once the R7 write-back persists freshly-minted ids to source" — which is incorrect. ensureHfIds skips elements that already carry data-hf-id, so legacy values (e.g. data-hf-id="my-title") persist indefinitely and are NOT automatically re-minted. Exact-match targeting still works correctly. Update comment to reflect actual behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eview) Addresses Rames' review on #1271: execDataAttrPattern returned the first regex match without checking for a second. A duplicate id/data-hf-id in source (id drift) would silently patch one element and leave the other stale. Now warns when more than one element matches. By the mint contract it should never fire. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…3 review) Adds test: "hfId match is authoritative — selector is not used as a narrowing filter". When hfId matches element A and selector points at element B, findTagByTarget returns A without consulting selector as a narrowing filter. Pins the intended behaviour so a future refactor cannot silently start narrowing by selector. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Elements now get data-hf-id minted by ensureHfIds; parser reads data-hf-id as model id, so HTML id attrs are no longer the model id. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Locks the preservation guarantee the write-back design depends on: a Studio edit targeting by id or selector (it never sends hfId) must not strip an existing data-hf-id, or the stable handle is destroyed by the next edit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…review) Addresses review on #1272 (Miguel P3 + Rames): findTargetElement interpolated target.hfId raw into a [data-hf-id="..."] selector. Escape it (CSS attr-value injection guard) and warn when a hfId matches more than one element instead of silently patching an arbitrary one. Adds an injection-guard test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
7d1bf83 to
4fceb69
Compare
… for persistHfIdsIfNeeded Prior tests only exercised normalizeHfIds (pure function) and the existing pin guard in ensureHfIds — both pass on the parent commit without any Task 1 code. Replace with three tests that exercise the actual disk write-back: - writes data-hf-id to disk when source is untagged - does not rewrite disk when source is already tagged (idempotent) - returned id matches id written to disk (serve-time == persist-time invariant) These fail on the parent commit (persistHfIdsIfNeeded doesn't exist) and green after Task 1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-back (R7, Task 1-2) Two integration tests against the preview route (via Hono test harness): - served HTML carries data-hf-id on body elements (>= 2 matches for div+p) - disk file contains data-hf-id after first GET (write-back verified via readFileSync) These fail on the parent commit (no hfIdPersist wiring in preview.ts) and green after Task 1. Closes the verification gap flagged in review. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
a067aa9 to
4b1a36d
Compare

Summary
persistHfIdsIfNeededtohfIdPersist.ts— runsensureHfIdsand writes the result back to the source file on first serve (idempotent; skips write if already tagged)persistHfIdsIfNeededbefore bundling sodata-hf-idis persisted to disk and present in the served HTML in one requestpersistHfIdsIfNeededdata-hf-idand the source file on disk is updated after the first GETTest plan
bunx vitest run src/studio-api/helpers/hfIdPersist.test.ts— 5 passbunx vitest run src/studio-api/routes/preview.test.ts— 15 pass (includes 2 new hf-id surfacing tests)