Go: idna-ip-literal-smuggle (UTS-46 NFKC digit-fold SSRF)#3841
Open
astrogilda wants to merge 2 commits intosemgrep:mainfrom
Open
Go: idna-ip-literal-smuggle (UTS-46 NFKC digit-fold SSRF)#3841astrogilda wants to merge 2 commits intosemgrep:mainfrom
astrogilda wants to merge 2 commits intosemgrep:mainfrom
Conversation
Detect untrusted hostnames that flow through golang.org/x/net/idna UTS-46 ToASCII mapping and reach a network sink without a post-mapping IP-literal recheck. UTS-46 NFKC mapping folds 100 non-ASCII codepoints across 8 classes to ASCII digits 0-9, allowing inputs like "0.<superscript-1>.0.0" to pass a pre-IDNA net.ParseIP check, map to "0.1.0.0", and be dialed as if they were DNS names. Taint mode with two labels (PRE_IDNA, POST_IDNA). Sanitizer requires a trailing-dot trim followed by netip.ParseAddr or net.ParseIP, in that order; without the trim the recheck silently passes the multi-trailing- dot variant. Ships with one Go fixture: 21 ruleid markers covering all 8 fold classes plus 6 ok markers covering both compliant trim variants, the Punycode out-of-scope case, and a Devanagari Valid-but-not-mapped negative-control. Validates on Semgrep 1.161.0; finding count on the fixture matches the marker count exactly. Signed-off-by: Sankalp Gilda <sankalp.gilda@gmail.com>
The library runs UTS-46 mapping in validateAndMap before the
encode-vs-decode branch, so (*Profile).ToUnicode produces the same
digit-folded ASCII output as (*Profile).ToASCII for in-scope
codepoints. Empirically verified against golang.org/x/net/idna v0.53.0
on all three digit-folding profiles. Without the ToUnicode patterns
the rule would miss any caller that uses ToUnicode for hostname
canonicalisation and then passes the result to a network sink.
Rule changes:
- Add ToUnicode patterns alongside ToASCII for each named profile
(idna.Lookup, idna.Display, idna.Registration, idna.New(...))
and for the var-bound *idna.Profile receiver fallback.
- Drop the redundant "Pattern: ..." paragraph that paraphrased the
opening sentence verbatim.
- Update the family-count claim ("8 classes") to the seven
Unicode-block ranges that account for the 100 fold codepoints, and
add an explicit note that Devanagari digits (U+0966-096F) are not
in scope: empirical testing confirms they pass through Punycode
rather than fold to ASCII.
Test fixture: add Class 8 (P8a, P8b) covering Lookup.ToUnicode and
Display.ToUnicode. Verified against semgrep 1.161.0: 23 findings on
the rule, up from 21 before this commit.
Author
|
Pushed a follow-up extending the rule scope and tightening the message ( The library runs UTS-46 mapping in What changed:
PR description updated to reflect the new scope. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Go: idna-ip-literal-smuggle (UTS-46 NFKC digit-fold SSRF)
Summary
go/lang/security/audit/idna-ip-literal-smuggle.yamldetecting an SSRF anti-pattern where an untrusted hostname is mapped throughgolang.org/x/net/idnaUTS-46 mapping ((*Profile).ToASCIIor(*Profile).ToUnicodeon a digit-folding profile) and then reaches a network sink with no post-mapping IP-literal recheck.PRE_IDNA,POST_IDNA), label transformation at the IDNA call site, intra-procedural. Sanitizer requires a trailing-dot trim plusnetip.ParseAddrornet.ParseIPrecheck, in that order.idna-ip-literal-smuggle.go) carrying 23// ruleid:markers covering all seven in-scope Unicode-block fold ranges plus theToUnicodeentry-point fold, plus 6// ok:negatives covering both compliant trim variants, the Punycode out-of-scope case, and a Devanagari Valid-but-not-mapped negative control.Motivation
The rule complements an inter-procedural CodeQL companion query in the CodeQL community pack (PR
https://github.com/github/codeql/pull/21784). The split is deliberate: Semgrep OSS provides a high-precision direct-call sweep that runs on free-tier CI, while CodeQL provides the inter-procedural recall vehicle for shapes where the IDNA call and the sink are wrapped behind an intermediate function (the canonical real-worldcanonicalAddr-style wrapper). The two engines hit different points on the precision/recall frontier; both are needed for a complete caller-side audit. This design is documented indocs/research/v0.1-detection-strategy.mdof the upstream rule repository.Threat model
idna.Lookup,idna.Display,idna.Registration, and anyidna.New(idna.MapForLookup(), ...)profile run NFKC compatibility decomposition before producing ASCII output. The fold runs in both(*Profile).ToASCIIand(*Profile).ToUnicodebecause the library executesvalidateAndMapbefore the encode-vs-decode branch. Enumerating the Unicode 16 table shipped withgolang.org/x/textv0.21.0 yields 100 codepoints partitioned into seven Unicode-block ranges that fold to ASCII digits 0-9: Latin-1 superscripts (3), mathematical superscripts (7), mathematical subscripts (10), circled digits (10), fullwidth digits (10), the Mathematical Alphanumeric Symbols block (50), segmented digits (10). Devanagari digits (U+0966..U+096F) are not in scope: empirically verified againstgolang.org/x/net/idna v0.53.0, they pass through Punycode rather than fold to ASCII. An attacker-controlled hostname containing one of these codepoints passes a pre-IDNAnet.ParseIPcheck (it is not ASCII), maps to an ASCII IP literal, and reaches the sink as if it were a DNS name. The result is SSRF against loopback, RFC 1918, link-local, or cloud metadata ranges.The trailing-dot trim is required for the post-IDNA recheck to work.
idna.Lookup.ToASCII("0.<superscript-1>.0.0.")returns"0.1.0.0.", whichnet.ParseIPrejects on its own. Without the trim, the recheck silently passes and the smuggle survives. The sanitizer pattern requires both predicates, in order.Verification
semgrep --validate --config go/lang/security/audit/idna-ip-literal-smuggle.yamlon Semgrep 1.161.0: clean.semgrep --config <rule> idna-ip-literal-smuggle.go: 23 findings, exactly matching the 23// ruleid:markers.golang/go,kubernetes/kubernetes, andprometheus/prometheus(660 MB of Go source, three projects with a high incidence of host-string handling): zero findings outside the fixture. The canonical real-world hit shape, where the IDNA call and the sink are split across a wrapper function, is intentionally out of scope for the OSS tier and is covered by the companion CodeQL query.Cross-language CVE precedent
idnacrate): the same UTS-46 NFKC digit-fold surface, demonstrated cross-runtime.kjd/idnaPython): related Unicode normalization preprocessing surface in a sibling IDNA implementation.The pattern is not Go-specific; the rule is. Each ecosystem needs its own static-analysis vehicle.
Upstream library disposition
The Go security team reviewed an advisory for this anti-pattern and declined treating it as a library bug. The position is internally consistent:
idna.Lookup.ToASCIIis documented to implement UTS-46 mapping, and the post-mapping IP-literal recheck is a caller responsibility that the spec does not require the library to perform. Caller-side static analysis is the right vehicle for the gap, which is what this rule provides.Companion PR
The CodeQL community-pack PR carries the inter-procedural-recall companion query:
https://github.com/github/codeql/pull/21784. Reviewers may find that pack useful for understanding the full coverage envelope.Pro and experimental variants
The OSS rule submitted here ships with intra-procedural taint only. An interfile (
interfile: true) variant and an experimental field-name regex source-set widening (coveringEndpoint,Server,Address,Addr,Target,Upstream,Originfield names in addition toHost/Hostname) live in the upstream rule repository at https://github.com/astrogilda/idna-ip-literal-smuggle-rules. They are not bundled into this PR because the registry convention ingo/lang/security/is one rule per topic; operators who need cross-file recall can pull the variant directly from the upstream repository.Severity calibration
WARNING, notERROR. A v0.1.x sweep acrossgolang/go,kubernetes/kubernetes, andprometheus/prometheusproduced zero alerts at the OSS tier, because production Go callers wrapidna.Lookup.ToASCIIin a one-deep helper and intra-procedural taint cannot step through. The OSS tier is therefore a high-precision direct-call sweep with very low alert volume in practice; the recall vehicle is the inter-procedural CodeQL companion. Promoting toERRORwould not change the alert volume on real codebases but would change the failure mode on edge cases.confidence: MEDIUM,likelihood: MEDIUM,impact: HIGHreflect the specific attack: the precondition (caller passes user input into a UTS-46 profile) is not universal but is well-defined, and the consequence (SSRF against metadata or internal services) is severe.Out of scope (documented in the rule message)
idna.ToASCII(x)package-level helper. Punycode profile, nil mapping, no fold surface.url.Parseonly, no directidna.*.ToASCIIcall). Already validated post-decode.CLA
Will sign on first PR comment via cla-assistant.
Canonical source for these artefacts: https://github.com/astrogilda/idna-ip-literal-smuggle-rules