Skip to content

Document ICU regex policy and track upstream fix #3126

@maphew

Description

@maphew

Summary

Over the past two months we've had a stream of PRs and bug reports related to ICU regex linkage (#2986, #2991, #2948, #2965, #3013, #3066, #3112). Multiple agents and contributors have been making overlapping, sometimes conflicting fixes because there was no written policy on how ICU should be handled.

This issue tracks two things:

1. Document the policy (in progress)

Branch docs/icu-policy adds docs/ICU-POLICY.md stating the rule and rationale:

  • All release binaries use -tags gms_pure_go (Go stdlib regexp, no ICU)
  • CGO stays enabled (needed for embedded Dolt, independent of ICU)
  • ICU is a test-only dependency (exercised via test-cgo.sh and CI test matrix)
  • Post-build verification catches ICU linkage regressions

Also updates CONTRIBUTING.md and docs/INSTALLING.md to stop telling contributors to install ICU.

2. Track upstream fix

The root cause is that go-mysql-server makes ICU opt-out (gms_pure_go) rather than opt-in. We've opened an upstream issue proposing they invert this:

Until upstream changes the default, we maintain:

  • A fork of go-mysql-server with !windows build constraint (replace in go.mod)
  • gms_pure_go tag in every build path
  • Post-build binary scanning (scripts/verify-cgo.sh)

Related issues/PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions