Skip to content

Arabic ligatures can be dropped during textkit bidi reordering #3406

@e-Naeim

Description

@e-Naeim

Summary

Arabic ligatures can be dropped or assigned to the wrong visual run during @react-pdf/textkit bidi reordering.

This shows up in Arabic PDFs generated through @react-pdf/renderer when a shaped Arabic ligature appears more than once in the same line, especially near mixed LTR spans such as company names or percentages.

Reproduction text

توجد شراكة استراتيجية مع شركات متعددة. تتكون هيكلية ملكية الشركة من ثلاثة مساهمين، حيث يمتلك المساهم الأول (saudi lend gate) نسبة 50% من الأسهم.

The important cases are:

  • شراكة and شركات both use a Cairo Arabic شر ligature glyph.
  • The same line also contains LTR text, (saudi lend gate), and a percentage, 50%.

Actual result

reorderLine dedupes ligatures by glyph.id. In Cairo, multiple logical شر clusters can resolve to the same glyph id, so one logical occurrence is treated as a duplicate and omitted.

Before: Arabic ligature dropped by glyph-id dedupe

Expected result

Each logical ligature cluster should be emitted once, even if another word uses the same ligature glyph id. Ligature continuation codepoints within the same source glyph should still be deduped.

After: repeated ligatures preserved

Root cause

In packages/textkit/src/layout/bidiReordering.ts, the current implementation:

  1. Builds output runs by slicing reordered visual indices with the original logical run boundaries.
  2. Dedupes ligatures with addedGlyphs.has(glyph.id).

That makes glyph.id too broad as a dedupe key. It removes a second occurrence of the same ligature glyph from a different source cluster. For mixed bidi text, original run boundaries can also no longer describe the correct visual run ownership after reordering.

The safer key is the source run plus the source glyph-array index. That identifies true ligature continuations while allowing the same glyph id to appear in another word.

Proposed fix

I opened a PR that:

  • resolves each visual index back to its owning source run and source glyph index
  • emits output runs by contiguous source-run ownership after bidi reordering
  • dedupes ligature continuations by source glyph index, not by glyph.id
  • updates stringIndices and glyphIndices for the reordered runs
  • adds regression tests for repeated ligature glyph ids and Arabic + LTR mixed bidi text

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions