fix: array traversal drops streaming indices from variable binding#2655
fix: array traversal drops streaming indices from variable binding#2655lawrence3699 wants to merge 1 commit intomikefarah:masterfrom
Conversation
When streaming values (e.g. from keys[]) pipe into an array traversal like $o[.], only the first streaming index was used. The RHS collect expression produces a separate SequenceNode per streaming value when EvaluateTogether is false. The existing code at line 126 only took the first SequenceNode, discarding the rest. When the LHS is a single bound variable and the RHS has multiple index sets, traverse each set separately. Fixes mikefarah#2593
There was a problem hiding this comment.
Pull request overview
Fixes a streaming/traversal mismatch in traverseArrayOperator where array/map indexing via a bound variable (e.g. $o[.]) would only use the first streamed index set, dropping subsequent indices from the stream.
Changes:
- Update
traverseArrayOperatorto traverse the LHS against each per-stream RHS index set when the RHS yields multiple collected sequences. - Add regression tests covering arrays, maps, and longer sequences for the
$o[.]streaming-keys case.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| pkg/yqlib/operator_traverse_path.go | Adjusts traversal logic to handle multiple RHS “collected index” sequences without dropping streamed indices. |
| pkg/yqlib/operator_traverse_path_test.go | Adds regression scenarios ensuring streaming keys[] indices correctly traverse bound variables for arrays/maps. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // When streaming values produce per-value index sets in the RHS and the LHS | ||
| // is a single node (e.g. a bound variable), traverse the LHS with each | ||
| // index set separately. Without this, only the first set of indices is used | ||
| // and the rest are silently dropped. | ||
| if rhs.MatchingNodes.Len() > 1 && lhs.MatchingNodes.Len() < rhs.MatchingNodes.Len() { | ||
| var allResults = list.New() | ||
| for el := rhs.MatchingNodes.Front(); el != nil; el = el.Next() { |
There was a problem hiding this comment.
The new branch is intended to handle the case where the LHS is a single node (e.g. a bound variable), but the condition lhs.MatchingNodes.Len() < rhs.MatchingNodes.Len() can also be true when the LHS has multiple matches (e.g. 2 LHS nodes, 3 RHS sequences). In that case this code traverses each RHS index set against all LHS nodes, which can introduce an unintended cross-product of results. Consider tightening the condition to explicitly guard on lhs.MatchingNodes.Len() == 1 (or <= 1) so the multi-RHS traversal only applies to the intended bound-variable scenario.
Fixes #2593
Problem
When streaming values pipe into an array traversal with a bound variable, only the first index is used. For example:
The workaround of adding an intermediate variable binding (
. as $k |) works, but shouldn't be necessary:Root cause
In
traverseArrayOperator, the RHS collect expression is evaluated with the full streaming context. WhenEvaluateTogetheris false (the default for streaming values), the collect operator creates a separate SequenceNode per streaming value. Line 126 then only takes the first SequenceNode's content, discarding the indices from all other streaming values.Fix
When the RHS produces more SequenceNodes than the LHS has matching nodes (i.e., the LHS is a single bound variable but the index varies per streaming value), traverse each set of indices separately against the full LHS.
Before/After
Works for both arrays and maps:
Testing
Added three regression tests covering arrays, maps, and longer sequences. Full test suite passes: