Summary
@react-pdf/textkit@6.3.0 appears to drop glyph/source-character mapping when a font layout engine returns a glyph with codePoints: [].
This can happen with fontkit@2.0.4: if a glyph is first loaded by glyph id via font.getGlyph(gid), a later font.layout(...) call can return that same glyph object with empty codePoints.
When textkit receives that glyph run, it drops the empty-codepoint glyph from glyphs, stringIndices, and glyphIndices.
Reproduction
Repro repo:
https://github.com/brandon-julio-t/react-pdf-textkit-codepoints-repro
Steps:
bun install
bun run repro
The repro uses:
@react-pdf/textkit@6.3.0
fontkit@2.0.4
- a real font file from
@fontsource/roboto
It first opens the font with fontkit, primes one glyph by glyph id, then lays out "AB".
Actual Output
fontkit glyph.codePoints after getGlyph(gid): [ [], [ 66 ] ]
Text: AB
Expected codePoints: [ 65, 66 ]
Actual textkit glyph.codePoints: [ [ 66 ] ]
Actual textkit stringIndices: [ 0 ]
Actual textkit glyphIndices: [ 0 ]
AssertionError [ERR_ASSERTION]: textkit should preserve source character mapping when fontkit returns a glyph with empty codePoints
The first character, A / code point 65, is lost from the final textkit glyph mapping.
Expected Behavior
Textkit should preserve source-character mapping when font.layout(...) returns a glyph with missing/empty codePoints, at least when the glyph sequence can still be aligned unambiguously against the original run string.
Expected output for the repro:
Actual textkit glyph.codePoints: [ [ 65 ], [ 66 ] ]
Actual textkit stringIndices: [ 0, 1 ]
Actual textkit glyphIndices: [ 0, 1 ]
Notes
This repro does not involve PDF rendering. It isolates the behavior to textkit's handling of a fontkit glyph run.
End-to-end PDF reproduction is difficult because the symptom depends on prior fontkit glyph-cache state in a long-running process. The repro makes that state explicit by calling font.getGlyph(gid) before font.layout(...), so the textkit behavior is deterministic.
Production Context
We observed this in a long-running Node/Bun service that generates PDFs with @react-pdf/renderer@4.5.1, @react-pdf/textkit@6.3.0, and fontkit@2.0.4.
Our PDFs use NotoSansSC-Regular.ttf for Chinese text. The visible symptom was intermittent missing leading Chinese characters in rendered PDFs, for example the first character of an item name such as 洗水... or 六... disappearing from the PDF text mapping/rendered output.
The source data and React PDF input strings were intact. Regenerating after a process restart restored the missing characters, which made the issue appear random at the PDF level. The isolated repro above makes the underlying fontkit glyph-cache state deterministic.
A local patch that normalizes missing glyph codePoints from the original run string fixes the repro and avoids mutating fontkit's cached glyph objects.
Summary
@react-pdf/textkit@6.3.0appears to drop glyph/source-character mapping when a font layout engine returns a glyph withcodePoints: [].This can happen with
fontkit@2.0.4: if a glyph is first loaded by glyph id viafont.getGlyph(gid), a laterfont.layout(...)call can return that same glyph object with emptycodePoints.When textkit receives that glyph run, it drops the empty-codepoint glyph from
glyphs,stringIndices, andglyphIndices.Reproduction
Repro repo:
https://github.com/brandon-julio-t/react-pdf-textkit-codepoints-repro
Steps:
The repro uses:
@react-pdf/textkit@6.3.0fontkit@2.0.4@fontsource/robotoIt first opens the font with fontkit, primes one glyph by glyph id, then lays out
"AB".Actual Output
The first character,
A/ code point65, is lost from the final textkit glyph mapping.Expected Behavior
Textkit should preserve source-character mapping when
font.layout(...)returns a glyph with missing/emptycodePoints, at least when the glyph sequence can still be aligned unambiguously against the original run string.Expected output for the repro:
Notes
This repro does not involve PDF rendering. It isolates the behavior to textkit's handling of a fontkit glyph run.
End-to-end PDF reproduction is difficult because the symptom depends on prior fontkit glyph-cache state in a long-running process. The repro makes that state explicit by calling
font.getGlyph(gid)beforefont.layout(...), so the textkit behavior is deterministic.Production Context
We observed this in a long-running Node/Bun service that generates PDFs with
@react-pdf/renderer@4.5.1,@react-pdf/textkit@6.3.0, andfontkit@2.0.4.Our PDFs use
NotoSansSC-Regular.ttffor Chinese text. The visible symptom was intermittent missing leading Chinese characters in rendered PDFs, for example the first character of an item name such as洗水...or六...disappearing from the PDF text mapping/rendered output.The source data and React PDF input strings were intact. Regenerating after a process restart restored the missing characters, which made the issue appear random at the PDF level. The isolated repro above makes the underlying fontkit glyph-cache state deterministic.
A local patch that normalizes missing glyph
codePointsfrom the original run string fixes the repro and avoids mutating fontkit's cached glyph objects.