perf(runtime): add typed-slice fast paths to `in` operator by mingrammer · Pull Request #960 · expr-lang/expr

mingrammer · 2026-05-21T02:43:49Z

Summary

runtime.In (called by the OpIn opcode for the in operator) uses reflect to iterate its right-hand side. The reflect path is correct for any slice type, but it pays one heap allocation per element on every typed slice, because reflect.Value.Index(i).Interface() must box the element when the slice's element type is not already interface{}.

For []any the boxing is a no-op (the cell is already an interface), so the existing path is already zero-alloc-per-element. For []string, []float64, []int64, []int, and []bool it adds N heap allocations per in evaluation, which is a significant tax when in runs in a hot loop (e.g. rule engines or expression-based filters over candidate lists).

This PR adds a small type-switch at the top of In for those five common shapes. Each case uses a pure-Go for ... range loop, so no reflect, no per-element boxing, no Equal() round-trip.

Behavior preserved

On a needle/element type mismatch the case falls through to the existing reflect path, so Equal()'s cross-type promotion semantics are preserved. For example, an int needle against []float64 still matches via numeric promotion. The existing test suite is untouched and still passes; new tests in vm/runtime/runtime_test.go cover hit/miss for each fast path, an empty typed slice, cross-type needles, and unchanged []any semantics.

Benchmarks

Apple M4 Pro, darwin/arm64, go test -benchtime=1s ./vm/runtime/:

Bench	before	after	speedup
`StringSlice/N=8`	112.8 ns, 6 allocs	18.96 ns, 1 alloc	6.0×
`StringSlice/N=64`	659.8 ns, 34 allocs	31.69 ns, 1 alloc	20.8×
`StringSlice/N=256`	2240 ns, 130 allocs	60.28 ns, 1 alloc	37.2×
`Float64Slice/N=8`	85.1 ns, 6 allocs	14.99 ns, 1 alloc	5.7×
`Float64Slice/N=64`	442.1 ns, 34 allocs	23.66 ns, 1 alloc	18.7×
`Float64Slice/N=256`	1794 ns, 130 allocs	169.1 ns, 1 alloc	10.6×
`Int64Slice/N=8`	82.0 ns, 6 allocs	14.77 ns, 1 alloc	5.6×
`Int64Slice/N=64`	973.8 ns, 34 allocs	23.15 ns, 1 alloc	42.1×
`Int64Slice/N=256`	1610 ns, 130 allocs	166.0 ns, 1 alloc	9.7×
`AnySliceOfString/N=*`	unchanged	unchanged	—

The remaining 1 alloc/op is the call-site itself boxing the needle into any when calling runtime.In; it lives outside the changed code.

Test plan

go test ./vm/runtime/... (new + existing)
go test ./... (full suite, all green)
go vet ./...
go test -bench=BenchmarkIn -benchmem ./vm/runtime/ (numbers above)

`in` dispatches through `runtime.In`, which uses reflect to iterate the right-hand side. The reflect path is correct for any slice type but pays one heap allocation per element on every typed slice, because `reflect.Value.Index(i).Interface()` must box the element when the slice's element type is not already `interface{}`. For `[]any` this boxing is a no-op (the cell is already an interface), so the existing path is already zero-alloc-per-element. For `[]string`, `[]float64`, `[]int64`, `[]int`, and `[]bool` it adds N heap allocations per `in` evaluation, which is significant when `in` runs in a hot loop (e.g. rule engines or expression-based filters over candidate lists). This patch adds a type-switch at the top of `In` for those five common shapes. Each case uses a pure-Go `for ... range` loop, so no reflect, no per-element boxing, no Equal() round-trip. On a needle/element type mismatch the case falls through to the existing reflect path so Equal()'s cross-type promotion semantics are preserved (e.g. an int needle against a []float64 still matches). Benchmarks (Apple M4 Pro, darwin/arm64, -benchtime=1s): bench (N elements) before after speedup StringSlice/N=8 112.8 ns/op, 6 allocs 18.96 ns/op, 1 alloc 6.0x StringSlice/N=64 659.8 ns/op, 34 allocs 31.69 ns/op, 1 alloc 20.8x StringSlice/N=256 2240 ns/op, 130 allocs 60.28 ns/op, 1 alloc 37.2x Float64Slice/N=8 85.1 ns/op, 6 allocs 14.99 ns/op, 1 alloc 5.7x Float64Slice/N=64 442.1 ns/op, 34 allocs 23.66 ns/op, 1 alloc 18.7x Float64Slice/N=256 1794 ns/op, 130 allocs 169.1 ns/op, 1 alloc 10.6x Int64Slice/N=8 82.0 ns/op, 6 allocs 14.77 ns/op, 1 alloc 5.6x Int64Slice/N=64 973.8 ns/op, 34 allocs 23.15 ns/op, 1 alloc 42.1x Int64Slice/N=256 1610 ns/op, 130 allocs 166.0 ns/op, 1 alloc 9.7x AnySliceOfString/N=* unchanged (already uses zero-alloc reflect path) The remaining 1 alloc/op is the call-site boxing the needle into `any` when calling runtime.In; it lives outside the changed code. Tests in `vm/runtime/runtime_test.go` cover hit/miss for each fast path, empty typed slice, cross-type needle (must fall through to reflect), and unchanged `[]any` semantics. The existing test suite is untouched and still passes. Signed-off-by: MinJae Kwon <mingrammer@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(runtime): add typed-slice fast paths to `in` operator#960

perf(runtime): add typed-slice fast paths to `in` operator#960
mingrammer wants to merge 1 commit into
expr-lang:masterfrom
mingrammer:perf/runtime-in-typed-slice-fast-path

mingrammer commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mingrammer commented May 21, 2026

Summary

Behavior preserved

Benchmarks

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant