perf(runtime): add typed-slice fast paths to in operator#960
Open
mingrammer wants to merge 1 commit into
Open
Conversation
`in` dispatches through `runtime.In`, which uses reflect to iterate the
right-hand side. The reflect path is correct for any slice type but pays
one heap allocation per element on every typed slice, because
`reflect.Value.Index(i).Interface()` must box the element when the slice's
element type is not already `interface{}`.
For `[]any` this boxing is a no-op (the cell is already an interface), so
the existing path is already zero-alloc-per-element. For `[]string`,
`[]float64`, `[]int64`, `[]int`, and `[]bool` it adds N heap allocations
per `in` evaluation, which is significant when `in` runs in a hot loop
(e.g. rule engines or expression-based filters over candidate lists).
This patch adds a type-switch at the top of `In` for those five common
shapes. Each case uses a pure-Go `for ... range` loop, so no reflect, no
per-element boxing, no Equal() round-trip. On a needle/element type
mismatch the case falls through to the existing reflect path so Equal()'s
cross-type promotion semantics are preserved (e.g. an int needle against
a []float64 still matches).
Benchmarks (Apple M4 Pro, darwin/arm64, -benchtime=1s):
bench (N elements) before after speedup
StringSlice/N=8 112.8 ns/op, 6 allocs 18.96 ns/op, 1 alloc 6.0x
StringSlice/N=64 659.8 ns/op, 34 allocs 31.69 ns/op, 1 alloc 20.8x
StringSlice/N=256 2240 ns/op, 130 allocs 60.28 ns/op, 1 alloc 37.2x
Float64Slice/N=8 85.1 ns/op, 6 allocs 14.99 ns/op, 1 alloc 5.7x
Float64Slice/N=64 442.1 ns/op, 34 allocs 23.66 ns/op, 1 alloc 18.7x
Float64Slice/N=256 1794 ns/op, 130 allocs 169.1 ns/op, 1 alloc 10.6x
Int64Slice/N=8 82.0 ns/op, 6 allocs 14.77 ns/op, 1 alloc 5.6x
Int64Slice/N=64 973.8 ns/op, 34 allocs 23.15 ns/op, 1 alloc 42.1x
Int64Slice/N=256 1610 ns/op, 130 allocs 166.0 ns/op, 1 alloc 9.7x
AnySliceOfString/N=* unchanged (already uses zero-alloc reflect path)
The remaining 1 alloc/op is the call-site boxing the needle into `any`
when calling runtime.In; it lives outside the changed code.
Tests in `vm/runtime/runtime_test.go` cover hit/miss for each fast path,
empty typed slice, cross-type needle (must fall through to reflect), and
unchanged `[]any` semantics. The existing test suite is untouched and
still passes.
Signed-off-by: MinJae Kwon <mingrammer@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
runtime.In(called by theOpInopcode for theinoperator) usesreflectto iterate its right-hand side. The reflect path is correct for any slice type, but it pays one heap allocation per element on every typed slice, becausereflect.Value.Index(i).Interface()must box the element when the slice's element type is not alreadyinterface{}.For
[]anythe boxing is a no-op (the cell is already an interface), so the existing path is already zero-alloc-per-element. For[]string,[]float64,[]int64,[]int, and[]boolit addsNheap allocations perinevaluation, which is a significant tax wheninruns in a hot loop (e.g. rule engines or expression-based filters over candidate lists).This PR adds a small type-switch at the top of
Infor those five common shapes. Each case uses a pure-Gofor ... rangeloop, so no reflect, no per-element boxing, noEqual()round-trip.Behavior preserved
On a needle/element type mismatch the case falls through to the existing reflect path, so
Equal()'s cross-type promotion semantics are preserved. For example, anintneedle against[]float64still matches via numeric promotion. The existing test suite is untouched and still passes; new tests invm/runtime/runtime_test.gocover hit/miss for each fast path, an empty typed slice, cross-type needles, and unchanged[]anysemantics.Benchmarks
Apple M4 Pro, darwin/arm64,
go test -benchtime=1s ./vm/runtime/:StringSlice/N=8StringSlice/N=64StringSlice/N=256Float64Slice/N=8Float64Slice/N=64Float64Slice/N=256Int64Slice/N=8Int64Slice/N=64Int64Slice/N=256AnySliceOfString/N=*The remaining 1 alloc/op is the call-site itself boxing the needle into
anywhen callingruntime.In; it lives outside the changed code.Test plan
go test ./vm/runtime/...(new + existing)go test ./...(full suite, all green)go vet ./...go test -bench=BenchmarkIn -benchmem ./vm/runtime/(numbers above)