feat(studio): sourcePatcher data-hf-id targeting (R1, T3)#1271
Conversation
miguel-heygen
left a comment
There was a problem hiding this comment.
Clean refactor of findTagByTarget — the extracted execDataAttrPattern and findTagByClass helpers are an improvement over the inlined patterns, and hfId correctly takes priority over id and selector. The negative-lookbehind fix for id= is good catch.
Two notes:
P2 — execDataAttrPattern used for id= attr, but old behavior differed:
The old id= pattern used \\bid= (word-boundary anchor). execDataAttrPattern builds \\b${attr}= dynamically, so attr = "id" produces \\bid= — same result. But the original code did escapeRegex(target.id) on the value; execDataAttrPattern also does escapeRegex(value). No regression here, just confirming the refactor is equivalent.
P3 — id branch no longer has its own negative-lookbehind fix:
The PR description mentions adding (?<!\\w)id= to prevent data-id= false matches. The diff shows the pattern is now \\bid= (via execDataAttrPattern). \\b achieves the same thing as (?<!\\w) here — both anchor to a word boundary before id. The PR description is slightly misleading but the implementation is correct.
Tests cover the new path well. ✅
james-russo-rames-d-jusso
left a comment
There was a problem hiding this comment.
Studio-side patcher targeting. Refactor is nice: findTagByTarget's three duplicated regex-pattern-build blocks collapse into execDataAttrPattern, and the class-selector branch lifts into findTagByClass. New hfId lookup goes first, then id, then composition-id (from selector), then class — order is right (most specific signal first).
Concerns
[execDataAttrPattern returns the FIRST match without de-dup check] If two elements somehow share data-hf-id="hf-x7k2" (shouldn't happen per the #1269 mint contract, but the briefing explicitly asked about defense in depth), the regex's first match wins silently. The patcher then mutates one element while the other is left stale — and the caller has no signal that their target was ambiguous.
Two options:
- Cheap version: after
pattern.exec, run it again to check ifpattern.exec(html.slice(match.end))finds a second match. If yes, log a console warning or throw. ~3 lines. - Stronger version: change the regex to global (
gi) likefindTagByClassalready does, collect all matches, and fail/warn if count > 1.
Either way, the "hfId match is ambiguous" condition should not be silent — once R2+ wires the patcher up to real edit flows, an ambiguous id is a data-corruption vector (you patch the wrong element, the user's edit lands on the wrong node).
[The findTagByClass rewrite changes the early-return semantics slightly] Before the refactor, findTagByTarget fell off the bottom with return null after the class branch exhausted. The new shape:
return findTagByClass(html, target);…returns whatever findTagByClass returns, which is null if the class-pattern misses. Functionally identical, but: if target.selector is non-class (e.g. a tag selector "div", an attribute selector other than the composition-id one, a descendant combinator), findTagByClass returns null immediately — same as before. No regression, but the early-return chain in the new shape is slightly less explicit than the previous fallthrough. Cosmetic.
Nits
[Tests are appropriately tight; one gap] All four T3 tests pin hfId targeting. The fallthrough test (hfId lookup falls through to selector when hfId not found) is the right shape. Missing test: hfId match found but element doesn't satisfy the rest of the target (e.g. selector also passed and points elsewhere) — the current code returns the hfId match without checking. Is that intentional (hfId is the authoritative signal) or a bug (selector should narrow further)? Worth one test pinning whichever behavior you intend.
[hfId?: string vs id?: string | null] The new field is string | undefined; the existing id field is string | null | undefined (per the type). Slight inconsistency. Not worth fixing if existing callers depend on the null shape — just noting it.
Question
[Is the index-based class-selector targeting still load-bearing post-R1?] Once R2+ wires hfId as the primary signal, do callers still pass selector + selectorIndex as a fallback? If not, the class-targeting path can be deprecated; if yes, the lookup order (hfId → id → composition-id → class) is permanent. Knowing the long-term plan helps reason about whether to invest in the multi-match defense (above) or just trust hfId to be unique.
Verdict
Solid refactor. The multi-match defense in execDataAttrPattern is the one thing worth adding before merge — silent picks-the-first behavior on a duplicated id is a data-corruption shape, and the cost of adding a check is tiny. Rest is clean. Leaving as a comment.
— Review by Rames D Jusso
…eview) Addresses Rames' review on #1271: execDataAttrPattern returned the first regex match without checking for a second. A duplicate id/data-hf-id in source (id drift) would silently patch one element and leave the other stale. Now warns when more than one element matches. By the mint contract it should never fire. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Addressed review (01470a0): |
01470a0 to
ccae778
Compare
f391075 to
4adc9b1
Compare
ccae778 to
aa85090
Compare
beb8b9f to
fde1c2d
Compare
aa85090 to
e02f3f9
Compare
james-russo-rames-d-jusso
left a comment
There was a problem hiding this comment.
Round 2 (delta since 7195a2c — one new commit e02f3f9 fix(studio): warn on duplicate match in execDataAttrPattern (R1, T3 review)).
Addresses my round-1 multi-match defense concern with Option 1 — the cheap version I outlined. New shape at sourcePatcher.ts:236-251:
const all = html.match(new RegExp(`<[^>]*\\b${attr}=(["'])${escapeRegex(value)}\\1[^>]*>`, "gi"));
if (all && all.length > 1) {
console.warn(`sourcePatcher: ${attr}="${value}" matched ${all.length} elements; patching the first. ids/attrs must be unique per document.`);
}Two notes:
- The
\\1backreference in the second regex (vs\\2in the first) is correct — the first regex has the outer(<[^>]*...)capture group bumping the quote group to 2; the second regex drops the outer wrap so the quote group is 1. Easy thing to miss, easy to misread later. Worth a one-line comment if you ever change the inner pattern. - "By the mint contract this should never fire" (your commit-msg framing + the warn text "ids/attrs must be unique per document") is the right framing — surfaces the upstream contract violation rather than silently corrupting. ✅
My round-1 nit about a missing "hfId match + selector pointing elsewhere" test gap is unchanged — was a clarification question more than an ask. Whichever behavior you intend (hfId is authoritative vs selector should narrow), it'll be more important to pin in a later iteration once R2+ wires hfId end-to-end. Leaving it for now.
Clean execution; leaving as a comment.
— Review by Rames D Jusso
miguel-heygen
left a comment
There was a problem hiding this comment.
Building on Rames's review.
Escalating: execDataAttrPattern multi-match warning is now implemented — confirming it landed. The diff shows the defensive warn path is there (lines after all && all.length > 1). Rames flagged the concern; it's addressed. Good.
Escalating: missing test for "hfId match found, selector also passed, which wins?" Rames flagged this and it's still absent. The current code returns the hfId match and never checks target.selector as a narrowing filter. This behavior is intentional (hfId is authoritative), but without a pinned test, a future refactor that accidentally starts narrowing by selector would silently break callers that rely on hfId being the full stop. One test:
it("hfId match is returned even when selector does not overlap", () => {
const html = `<h1 data-hf-id="hf-x7k2" class="a">A</h1><h1 class="b">B</h1>`;
const result = applyPatchByTarget(html, { hfId: "hf-x7k2", selector: ".b" }, { type: "inline-style", property: "color", value: "blue" });
expect(result).toContain('data-hf-id="hf-x7k2"');
expect(result).toContain("color: blue");
// .b element must NOT be patched
expect(result.indexOf("color: blue")).toBeLessThan(result.indexOf('class="b"'));
});New: execDataAttrPattern regex uses \b word-boundary but the old id= pattern used \bid= too — confirmed equivalent. My round-1 note stands (no regression). But the new data-hf-id branch uses the same execDataAttrPattern helper with attr = "data-hf-id". The \b anchor before data-hf-id is \bdata-hf-id= — the - chars in the attribute name ARE word-boundary positions in regex, so \b fires at the d in data. This correctly prevents a match on e.g. xdata-hf-id=. Fine.
P3: Is selector as a class-only fallback still the intended contract for all future R2+ callers? Rames asked; worth an explicit answer in the PR description or a comment. If selector will only ever be a .class-name or [data-composition-id="…"] pattern, the current regex approach is fine indefinitely. If structured queries like :nth-child or attribute-other-than-composition-id land in R2+, findTagByClass will silently return null and callers will get no error.
…eview) Addresses Rames' review on #1271: execDataAttrPattern returned the first regex match without checking for a second. A duplicate id/data-hf-id in source (id drift) would silently patch one element and leave the other stale. Now warns when more than one element matches. By the mint contract it should never fire. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
e02f3f9 to
02ad0bc
Compare
jrusso1020
left a comment
There was a problem hiding this comment.
Approved on behalf of James.
…review) Addresses Rames' review on #1270: clarifies that a pre-R1 clip authored with id="my-title" round-trips as data-hf-id="my-title" (non-hf-shaped but stable, exact-match) by design — targeting uses exact [data-hf-id="…"] match and does not require the hf- shape; legacy values re-mint only at the R7 write-back. Not a bug. Comment-only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
31d9074 to
abad19f
Compare
The original comment said legacy data-hf-id values "are re-minted only once the R7 write-back persists freshly-minted ids to source" — which is incorrect. ensureHfIds skips elements that already carry data-hf-id, so legacy values (e.g. data-hf-id="my-title") persist indefinitely and are NOT automatically re-minted. Exact-match targeting still works correctly. Update comment to reflect actual behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
abad19f to
9ddd145
Compare
…eview) Addresses Rames' review on #1271: execDataAttrPattern returned the first regex match without checking for a second. A duplicate id/data-hf-id in source (id drift) would silently patch one element and leave the other stale. Now warns when more than one element matches. By the mint contract it should never fire. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…3 review) Adds test: "hfId match is authoritative — selector is not used as a narrowing filter". When hfId matches element A and selector points at element B, findTagByTarget returns A without consulting selector as a narrowing filter. Pins the intended behaviour so a future refactor cannot silently start narrowing by selector. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
02ad0bc to
dfee196
Compare
* feat(core): clip-model hf- ids minted at parse, emitted as data-hf-id (R1) * docs(core): document legacy-id round-trip in clip-model readback (R1 review) Addresses Rames' review on #1270: clarifies that a pre-R1 clip authored with id="my-title" round-trips as data-hf-id="my-title" (non-hf-shaped but stable, exact-match) by design — targeting uses exact [data-hf-id="…"] match and does not require the hf- shape; legacy values re-mint only at the R7 write-back. Not a bug. Comment-only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(core): fix misleading legacy-id migration comment in htmlParser.ts The original comment said legacy data-hf-id values "are re-minted only once the R7 write-back persists freshly-minted ids to source" — which is incorrect. ensureHfIds skips elements that already carry data-hf-id, so legacy values (e.g. data-hf-id="my-title") persist indefinitely and are NOT automatically re-minted. Exact-match targeting still works correctly. Update comment to reflect actual behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(studio): sourcePatcher data-hf-id targeting (R1, T3) * fix(studio): warn on duplicate match in execDataAttrPattern (R1, T3 review) Addresses Rames' review on #1271: execDataAttrPattern returned the first regex match without checking for a second. A duplicate id/data-hf-id in source (id drift) would silently patch one element and leave the other stale. Now warns when more than one element matches. By the mint contract it should never fire. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(studio): pin hfId-is-authoritative-over-selector contract (R1, T3 review) Adds test: "hfId match is authoritative — selector is not used as a narrowing filter". When hfId matches element A and selector points at element B, findTagByTarget returns A without consulting selector as a narrowing filter. Pins the intended behaviour so a future refactor cannot silently start narrowing by selector. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(core): sourceMutation data-hf-id targeting (R1, T7) * test(core): update htmlParser baselines for R1 hf- id format Elements now get data-hf-id minted by ensureHfIds; parser reads data-hf-id as model id, so HTML id attrs are no longer the model id. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(core): data-hf-id survives id/selector patch (R1, T7) Locks the preservation guarantee the write-back design depends on: a Studio edit targeting by id or selector (it never sends hfId) must not strip an existing data-hf-id, or the stable handle is destroyed by the next edit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(core): escape hfId in selector + warn on duplicate match (R1, T7 review) Addresses review on #1272 (Miguel P3 + Rames): findTargetElement interpolated target.hfId raw into a [data-hf-id="..."] selector. Escape it (CSS attr-value injection guard) and warn when a hfId matches more than one element instead of silently patching an arbitrary one. Adds an injection-guard test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…1286) * feat(core): clip-model hf- ids minted at parse, emitted as data-hf-id (R1) * docs(core): document legacy-id round-trip in clip-model readback (R1 review) Addresses Rames' review on #1270: clarifies that a pre-R1 clip authored with id="my-title" round-trips as data-hf-id="my-title" (non-hf-shaped but stable, exact-match) by design — targeting uses exact [data-hf-id="…"] match and does not require the hf- shape; legacy values re-mint only at the R7 write-back. Not a bug. Comment-only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(core): fix misleading legacy-id migration comment in htmlParser.ts The original comment said legacy data-hf-id values "are re-minted only once the R7 write-back persists freshly-minted ids to source" — which is incorrect. ensureHfIds skips elements that already carry data-hf-id, so legacy values (e.g. data-hf-id="my-title") persist indefinitely and are NOT automatically re-minted. Exact-match targeting still works correctly. Update comment to reflect actual behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(studio): sourcePatcher data-hf-id targeting (R1, T3) * fix(studio): warn on duplicate match in execDataAttrPattern (R1, T3 review) Addresses Rames' review on #1271: execDataAttrPattern returned the first regex match without checking for a second. A duplicate id/data-hf-id in source (id drift) would silently patch one element and leave the other stale. Now warns when more than one element matches. By the mint contract it should never fire. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(studio): pin hfId-is-authoritative-over-selector contract (R1, T3 review) Adds test: "hfId match is authoritative — selector is not used as a narrowing filter". When hfId matches element A and selector points at element B, findTagByTarget returns A without consulting selector as a narrowing filter. Pins the intended behaviour so a future refactor cannot silently start narrowing by selector. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(core): sourceMutation data-hf-id targeting (R1, T7) * test(core): update htmlParser baselines for R1 hf- id format Elements now get data-hf-id minted by ensureHfIds; parser reads data-hf-id as model id, so HTML id attrs are no longer the model id. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(core): data-hf-id survives id/selector patch (R1, T7) Locks the preservation guarantee the write-back design depends on: a Studio edit targeting by id or selector (it never sends hfId) must not strip an existing data-hf-id, or the stable handle is destroyed by the next edit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(core): escape hfId in selector + warn on duplicate match (R1, T7 review) Addresses review on #1272 (Miguel P3 + Rames): findTargetElement interpolated target.hfId raw into a [data-hf-id="..."] selector. Escape it (CSS attr-value injection guard) and warn when a hfId matches more than one element instead of silently patching an arbitrary one. Adds an injection-guard test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(core): previewAdapter contract failing tests (T10 spec for R7) --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s 1-2) (#1289) * feat(core): clip-model hf- ids minted at parse, emitted as data-hf-id (R1) * docs(core): document legacy-id round-trip in clip-model readback (R1 review) Addresses Rames' review on #1270: clarifies that a pre-R1 clip authored with id="my-title" round-trips as data-hf-id="my-title" (non-hf-shaped but stable, exact-match) by design — targeting uses exact [data-hf-id="…"] match and does not require the hf- shape; legacy values re-mint only at the R7 write-back. Not a bug. Comment-only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(core): fix misleading legacy-id migration comment in htmlParser.ts The original comment said legacy data-hf-id values "are re-minted only once the R7 write-back persists freshly-minted ids to source" — which is incorrect. ensureHfIds skips elements that already carry data-hf-id, so legacy values (e.g. data-hf-id="my-title") persist indefinitely and are NOT automatically re-minted. Exact-match targeting still works correctly. Update comment to reflect actual behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(studio): sourcePatcher data-hf-id targeting (R1, T3) * fix(studio): warn on duplicate match in execDataAttrPattern (R1, T3 review) Addresses Rames' review on #1271: execDataAttrPattern returned the first regex match without checking for a second. A duplicate id/data-hf-id in source (id drift) would silently patch one element and leave the other stale. Now warns when more than one element matches. By the mint contract it should never fire. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(studio): pin hfId-is-authoritative-over-selector contract (R1, T3 review) Adds test: "hfId match is authoritative — selector is not used as a narrowing filter". When hfId matches element A and selector points at element B, findTagByTarget returns A without consulting selector as a narrowing filter. Pins the intended behaviour so a future refactor cannot silently start narrowing by selector. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(core): sourceMutation data-hf-id targeting (R1, T7) * test(core): update htmlParser baselines for R1 hf- id format Elements now get data-hf-id minted by ensureHfIds; parser reads data-hf-id as model id, so HTML id attrs are no longer the model id. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(core): data-hf-id survives id/selector patch (R1, T7) Locks the preservation guarantee the write-back design depends on: a Studio edit targeting by id or selector (it never sends hfId) must not strip an existing data-hf-id, or the stable handle is destroyed by the next edit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(core): escape hfId in selector + warn on duplicate match (R1, T7 review) Addresses review on #1272 (Miguel P3 + Rames): findTargetElement interpolated target.hfId raw into a [data-hf-id="..."] selector. Escape it (CSS attr-value injection guard) and warn when a hfId matches more than one element instead of silently patching an arbitrary one. Adds an injection-guard test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(core): previewAdapter contract failing tests (T10 spec for R7) * feat(core): hf-id write-back to disk + serve-time surfacing (R7, Task 1-2) * test(core): replace tautological stability tests with real disk tests for persistHfIdsIfNeeded Prior tests only exercised normalizeHfIds (pure function) and the existing pin guard in ensureHfIds — both pass on the parent commit without any Task 1 code. Replace with three tests that exercise the actual disk write-back: - writes data-hf-id to disk when source is untagged - does not rewrite disk when source is already tagged (idempotent) - returned id matches id written to disk (serve-time == persist-time invariant) These fail on the parent commit (persistHfIdsIfNeeded doesn't exist) and green after Task 1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(core): route-level tests for data-hf-id surfacing and disk write-back (R7, Task 1-2) Two integration tests against the preview route (via Hono test harness): - served HTML carries data-hf-id on body elements (>= 2 matches for div+p) - disk file contains data-hf-id after first GET (write-back verified via readFileSync) These fail on the parent commit (no hfIdPersist wiring in preview.ts) and green after Task 1. Closes the verification gap flagged in review. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

What
Adds
hfIdas a first-class target field tosourcePatcher(packages/studio/src/utils/sourcePatcher.ts). Whentarget.hfIdis set, the patcher finds the element via[data-hf-id="${hfId}"]before falling back toidor CSS selector.Also tightens the existing
id=regex patterns with a negative lookbehind to prevent false matches on attributes likedata-id=orcomposition-id=.Why
T3 spec — the sourcePatcher path handles the studio's in-browser rich-text editing. It needs to accept
hfIdtargets so the studio can address elements by stable content-based id (R1) rather than a positional CSS selector that can drift when siblings are inserted. Depends on PR #1270.How
hfId?: stringadded to the patch target typehfId→id→selector[data-hf-id]lookup uses the same attribute-selector pattern assourceMutation(?<!\w)id=preventsdata-id=from matching id-based patternsTest plan
sourcePatcher.test.ts— hfId targeting cases pass