Commit 1d4101f
Refactor parsing (#118)
* refactor(parser): replace manual tree walking with declarative tree queries
Replace manual node iteration with tree-sitter query-based node finding throughout the parser module. Introduces reusable find_node_text() and find_all_nodes() utilities to eliminate ~70 lines of boilerplate tree traversal code and improve code readability.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(parser): introduce SourceTree and eliminate duplicate parsing
This commit introduces a parse-once pattern to eliminate redundant tree
parsing throughout the codebase.
Key changes:
1. Introduce SourceTree struct
- Bundles Tree + source text + Language together
- Provides utility methods: get_text(), find_nodes(), find_text(), find_texts()
- Separates parsing from validation (new() no longer validates)
- Add validate() method for explicit validation
2. Refactor splitting to use existing tree
- Add split_from_tree(&SourceTree) to avoid re-parsing
- Remove split_query() convenience wrapper
- Remove extract_sql() convenience wrapper
3. Update callers to parse once
- execute.rs: SourceTree::new() → split_from_tree() + build_ast()
- validate.rs: SourceTree::new() → split_from_tree() + build_ast()
- cli.rs: Use explicit SourceTree pattern
4. Update ~50+ function signatures in builder.rs
- Change from (node, source) to (node, source_tree)
- Replace all get_node_text() calls with source_tree.get_text()
- Replace all find_all_nodes() calls with source_tree.find_nodes()
5. Remove dead code
- Remove find_all_nodes(), find_all_node_texts(), find_node_text()
- Remove get_node_text() - functionality moved to SourceTree
6. Export new public API
- Export SourceTree, build_ast, split_from_tree
Benefits:
- Eliminates duplicate parsing (execute.rs and validate.rs were parsing 2-3x)
- Makes parsing cost explicit (no hidden convenience wrappers)
- Cleaner API with bundled context
- All 193 tests passing (134 parser + 52 execute + 7 validate)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* move SourceTree struct to separate file
* refactor(parser): remove unused ParseError module
The parser::error::ParseError struct was never used in the codebase. All parse errors use GgsqlError::ParseError(String) directly with simple string messages. The detailed error struct with line/column information was leftover code that was never integrated.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(parser): introduce basic type parsers and unify value parsing
Consolidates duplicate parsing logic throughout the builder into reusable helper functions, reducing code duplication and improving maintainability.
Changes:
- Add basic type parser helpers: parse_string_node(), parse_number_node(), parse_boolean_node(), parse_array_node()
- Add parse_value_node() to unify value parsing with context for better error messages
- Add parse_data_source() to unify DataSource parsing across FROM clauses
- Refactor parse_literal_value() to use basic type parsers and leverage grammar validation
- Simplify parameter/scale property parsing by using child(0) instead of iteration (grammar guarantees single child)
- Inline and remove wrapper functions: parse_parameter_value(), parse_scale_property_value(), parse_coord_property_value(), parse_guide_property_value(), parse_theme_property_value()
- Apply basic type parsers consistently throughout codebase
Impact: -144 net lines (130 insertions, 274 deletions)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: unify LiteralValue and ParameterValue types
Removed the redundant LiteralValue enum and consolidated all literal
value handling into ParameterValue. The two types were structurally
identical except that ParameterValue includes an Array variant.
Since the grammar already enforces that literal aesthetic mappings
cannot be arrays, having a separate type provided no additional safety
while adding unnecessary complexity.
Changes:
- Removed LiteralValue enum from src/plot/types.rs
- Changed AestheticValue::Literal to use ParameterValue
- Added Display impl for ParameterValue (including Array formatting)
- Updated all references throughout codebase (23+ locations)
- Added unreachable!() guards for Array case in contexts where
grammar prevents arrays (literal_to_sql, Vega-Lite conversion)
- Simplified parse_literal_value() to use unified parse_value_node()
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* perf(parser): optimize find_node to return first match immediately
Changed find_node() to return as soon as it finds the first matching
node, instead of collecting all matches into a Vec and then taking
the first one.
This is more efficient especially for large parse trees with many
matches, as we avoid unnecessary allocations and iterations.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* perf(execute): eliminate duplicate parsing with declarative queries
Refactored CTE and SELECT extraction functions to eliminate duplicate
parsing and use declarative tree-sitter queries where possible.
Before:
- Query parsed 3 times: once in prepare_data_with_executor,
once in extract_ctes, once in extract_trailing_select
- extract_ctes used ~35 lines with manual recursive tree walking
- extract_trailing_select used ~35 lines of nested loops
After:
- Query parsed once, tree reused by all functions
- extract_ctes uses ~9 lines with declarative query
- extract_trailing_select uses ~20 lines with hybrid approach
- has_executable_sql uses fast string checks + selective parsing
Changes:
- extract_ctes(&str) → extract_ctes(&SourceTree)
- Replaced manual recursion with find_nodes() query
- Eliminated extract_ctes_from_node() helper (~26 lines removed)
- extract_trailing_select(&str) → extract_trailing_select(&SourceTree)
- Uses declarative query to find with_statement
- Manual check for ordering constraint (trailing SELECT after CTEs)
- transform_global_sql - now accepts SourceTree parameter
- has_executable_sql - simplified to use string checks + selective parsing
- Updated all call sites and test cases
Performance improvements:
- Eliminates 2 redundant parse operations per query execution
- Especially beneficial for queries with CTEs (previously 3x parsing)
- Aligns with declarative query pattern established in SourceTree
All tests pass: 134 parser tests, 52 execute tests
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: unify LiteralValue and ParameterValue types
Removed the redundant LiteralValue enum and consolidated all literal
value handling into ParameterValue. The two types were structurally
identical except that ParameterValue includes Array and Null variants.
Since the grammar already enforces that literal aesthetic mappings
cannot be arrays or nulls, having a separate type provided no additional
safety while adding unnecessary complexity.
Changes:
- Removed LiteralValue enum from src/plot/types.rs
- Changed AestheticValue::Literal to use ParameterValue
- Added Display impl for ParameterValue (including Array formatting)
- Updated all references throughout codebase (20+ files)
- Added unreachable!() guards for Array and Null cases in contexts where
grammar prevents them (literal_to_sql, Vega-Lite conversion, etc.)
This change was applied on top of the merged main branch which had
split execute.rs into multiple modules.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(parser): optimize parsing by eliminating redundant operations
Parse query once and reuse SourceTree across all parsing operations:
- extract_ctes(), extract_trailing_select(), transform_global_sql(), has_executable_sql()
- All functions now accept &SourceTree instead of parsing SQL again
Replace manual tree walking with declarative tree-sitter queries:
- extract_ctes(): 34-line recursion → 6-line declarative query
- has_executable_sql(): 46-line manual walking → 18-line declarative queries
- splitter.rs: Manual walking → find_node() calls
Performance improvements:
- Eliminated 3-4 redundant parse operations per query
- Parse once in execute/mod.rs, reuse for split, build_ast, extract_ctes, transform_global_sql
Net reduction: 65 lines of code (144 deleted, 79 added)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(parser): remove unused split_query() wrapper
Remove the split_query() convenience wrapper as it's not used internally.
We use split_from_tree() everywhere to avoid redundant parsing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(parser): introduce SourceTree and eliminate duplicate parsing
- Remove split_from_tree() and delete splitter.rs (200+ lines)
- Add lazy extraction methods to SourceTree:
- extract_sql(): Extract SQL portion with VISUALISE FROM injection
- extract_visualise(): Extract VISUALISE portion on demand
- Simplify extract_sql() return type from Result<Option<String>> to Option<String>
- Use tree-sitter sql_portion node directly instead of byte offset arithmetic
- Inline has_visualise() checks for clarity
- Restore and convert 11 tests from deleted splitter.rs
- Optimize execute/mod.rs: extract SQL once instead of twice
- Update all call sites: execute/mod.rs, cli.rs, validate.rs, parser/mod.rs
Benefits:
- Parse once, reuse SourceTree everywhere (no duplicate parsing)
- Lazy extraction - only allocate strings when needed
- Clearer API - methods on SourceTree instead of separate functions
- Simpler error handling - extract_sql() never fails
- More efficient execution in prepare_data_with_reader()
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(execute): detect VISUALISE FROM as executable SQL
The has_executable_sql() function now correctly recognizes VISUALISE FROM
syntax, which injects "SELECT * FROM <source>" and thus represents executable
SQL that creates global data.
This fixes test_visualise_from_cte which was failing because:
- Query: "WITH monthly AS (...) VISUALISE FROM monthly DRAW line DRAW point"
- has_executable_sql() returned false (didn't recognize VISUALISE FROM)
- has_global_table stayed false (no SQL was executed)
- Layer validation failed (layers need either layer.source or global data)
The fix adds a check for (visualise_statement (from_clause)) nodes, ensuring
that VISUALISE FROM queries are treated as executable SQL.
Fixes regression introduced in commit 179c73b where has_executable_sql was
refactored to use tree-sitter queries but didn't account for VISUALISE FROM.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(execute): simplify transform_global_sql API
Remove redundant sql parameter from transform_global_sql() - the function
now extracts SQL from SourceTree internally when needed.
Changes:
1. Remove sql parameter from transform_global_sql signature
2. Extract SQL internally for non-SELECT queries (CREATE, INSERT, UPDATE, VISUALISE FROM)
3. Use find_text() helper in extract_trailing_select for cleaner code
4. Clarify comment about VISUALISE FROM injection behavior
Before:
```rust
let sql_part = source_tree.extract_sql();
if let Some(ref sql_text) = sql_part {
transform_global_sql(&tree, sql_text, &ctes)
}
```
After:
```rust
let sql_part = source_tree.extract_sql();
if sql_part.is_some() {
transform_global_sql(&tree, &ctes)
}
```
Trade-off: For VISUALISE FROM queries, SQL is extracted twice (once at L528,
once inside transform_global_sql at L198). This is acceptable since:
- SELECT queries (common case) only extract once via extract_trailing_select()
- VISUALISE FROM queries (rare case) pay the cost for cleaner API
- Consistent with other functions like has_executable_sql()
Note: VISUALISE FROM is visualization syntax but triggers SQL injection
("SELECT * FROM <source>") that needs CTE transformation, which is why
it goes through this code path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: update CLAUDE.md and fix SCALE syntax in test
1. Update CLAUDE.md to reflect refactor_parsing branch changes:
- Replace "Query Splitter" with "SourceTree" in architecture diagram
- Document parse-once architecture and declarative query API
- Update parse_query() implementation to show SourceTree usage
- Document type system unification (LiteralValue merged into ParameterValue)
- Add SourceTree methods: find_node(), find_text(), extract_sql(), extract_visualise()
2. Fix incorrect SCALE syntax in test_complex_multi_viz_query:
- Changed: SCALE x SETTING type => 'date' (incorrect)
- To: SCALE x VIA date (correct)
VIA clause sets the scale transform, not SETTING. The incorrect syntax
would create a property called "type" but not actually set the transform.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(parser): replace manual tree walking with declarative tree queries
Replace manual node iteration with tree-sitter query-based node finding throughout the parser module. Introduces reusable find_node_text() and find_all_nodes() utilities to eliminate ~70 lines of boilerplate tree traversal code and improve code readability.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: remove unused imports in parser/mod.rs
* refactor(parser): use SourceTree parameter throughout builder.rs
Replace (node, source: &str) with (node, source: &SourceTree) in all builder
functions. This completes the parse-once architecture from commit 3391e9c.
Changes:
- Update build_ast signature to accept &SourceTree instead of &Tree + &str
- Update all ~30 function signatures in builder.rs to use source: &SourceTree
- Replace helper function calls with SourceTree methods:
- get_node_text() → source.get_text()
- find_all_nodes() → source.find_nodes()
- find_node_text() → source.find_text()
- find_all_node_texts() → source.find_texts()
- Remove now-unused helper functions from parser/mod.rs
- Update callers in execute/mod.rs and validate.rs
Benefits:
- Parse once, reuse CST throughout
- Cleaner API with bundled tree + source + language context
- All 156 parser tests passing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(parser): introduce basic type parsers and unify value parsing
Consolidates duplicate parsing logic throughout the builder into reusable
helper functions, reducing code duplication and improving maintainability.
Changes:
- Add basic type parser helpers: parse_string_node(), parse_number_node(),
parse_boolean_node(), parse_array_node()
- Add parse_value_node() to unify value parsing with context for better
error messages
- Add parse_data_source() to unify DataSource parsing across FROM clauses
- Add parse_literal_value() to cleanly parse literal aesthetic values
- Refactor parse_explicit_mapping() to use parse_literal_value()
- Simplify parameter/scale property parsing by using child(0) instead of
iteration (grammar guarantees single child)
- Inline and remove wrapper functions: parse_parameter_value(),
parse_array(), parse_coord_property_value(), parse_theme_property_value()
- Apply basic type parsers consistently throughout codebase
Impact: -100 net lines (133 insertions, 233 deletions)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(parser): improve scale parsing with declarative queries and null support
Multiple improvements to parser consistency and correctness:
1. Add null literal support:
- Add null_literal case to parse_value_node() returning ParameterValue::Null
- Update parse_literal_value() to reject both arrays and nulls in aesthetics
- Completes unified value parsing for all ParameterValue variants
2. Convert scale parsers to declarative query style:
- parse_scale_from_clause: Use find_node() with '(array) @arr' query
- parse_scale_to_clause: Use find_node() for array and identifier queries
- parse_scale_via_clause: Use find_node() for identifier query
- parse_scale_renaming_clause: Use find_nodes() for renaming_assignment query
3. Fix Result<Option<T>> anti-pattern:
- Change parse_scale_from_clause: Result<Option<Vec<...>>> -> Result<Vec<...>>
- Change parse_scale_to_clause: Result<Option<OutputRange>> -> Result<OutputRange>
- Change parse_scale_via_clause: Result<Option<Transform>> -> Result<Transform>
- Return errors when expected children are missing instead of Ok(None)
- Simplify callers to wrap results in Some(...) directly
Benefits:
- Consistent declarative parsing throughout scale infrastructure
- Clearer error semantics (missing children are errors, not valid empty states)
- Complete null handling across all parameter contexts
- Better alignment with tree-sitter query-based architecture
All 128 builder tests passing.
* test(parser): add tests for SourceTree query methods and basic type parsers
Add comprehensive test coverage for SourceTree declarative query API:
- find_node() and find_nodes() with alternation patterns
- find_text() and find_texts() with multiple matches
- get_text() for identifier extraction
Add tests for basic type parser helpers:
- parse_string_node() for string literals
- parse_number_node() for integers and floats
- parse_array_node() for string and number arrays
- parse_data_source() for identifiers and file paths
- parse_literal_value() for aesthetic literal values
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(parser): simplify tree traversal with unified field-based extraction
Eliminate defensive loops around choice nodes and unify name/value extraction
across all assignment types. Grammar now consistently uses 'name' and 'value'
fields, and parser uses direct child access where grammar guarantees exactly
one child node.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(parser): unify and simplify mapping parsing logic
Consolidate parse_global_mapping, parse_mapping_clause, and parse_mapping_list
into a single parse_mapping function. Eliminate unnecessary wrapper functions
and intermediate queries by leveraging tree-sitter's recursive search.
Key simplifications:
- Single parse_mapping() function handles all mapping contexts
- Direct query for mapping_element nodes (no intermediate mapping_list query)
- Removed nested loop (grammar guarantees at most one mapping_list)
- Inlined wrapper functions at call sites for clarity
Net reduction: -39 lines
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(parser): replace manual tree walking with declarative queries
Replace manual tree traversal with tree-sitter queries for cleaner,
more declarative code:
1. SQL portion extraction: Use query instead of children().find()
2. from_clause parsing: Use field-based query to directly extract
'table' field from table_ref, eliminating nested if statements
Query syntax used:
- "(sql_portion) @SQL" - simple node query
- "(table_ref table: (_) @table)" - field-based nested query
Net reduction: -5 lines
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs(CLAUDE.md): update parser documentation and run cargo fmt
Update CLAUDE.md to reflect current implementation:
- Fix build_ast signature (now takes &SourceTree instead of &Tree, &str)
- Update test count (916 total tests, 174 parser tests)
Apply cargo fmt formatting across codebase.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test(grammar): update test snapshots for field name changes
Updated 46 test snapshots to reflect grammar field name changes:
- 'aesthetic' -> 'name' in explicit_mapping
- 'name'/'value' -> 'from'/'to' in renaming_assignment
- Added 'name' and 'value' fields to all assignment types
All 57 tree-sitter tests passing.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>1 parent 7a5ed62 commit 1d4101f
20 files changed
Lines changed: 1615 additions & 1476 deletions
File tree
- src
- execute
- parser
- plot
- layer
- writer
- tree-sitter-ggsql
- test/corpus
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
| 25 | + | |
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| |||
100 | 100 | | |
101 | 101 | | |
102 | 102 | | |
103 | | - | |
104 | | - | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
105 | 107 | | |
106 | 108 | | |
107 | 109 | | |
| |||
226 | 228 | | |
227 | 229 | | |
228 | 230 | | |
229 | | - | |
| 231 | + | |
230 | 232 | | |
231 | | - | |
232 | | - | |
233 | | - | |
234 | | - | |
235 | | - | |
| 233 | + | |
236 | 234 | | |
237 | | - | |
| 235 | + | |
238 | 236 | | |
239 | | - | |
240 | | - | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
241 | 255 | | |
242 | 256 | | |
243 | 257 | | |
| |||
266 | 280 | | |
267 | 281 | | |
268 | 282 | | |
269 | | - | |
270 | | - | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
271 | 288 | | |
272 | 289 | | |
273 | | - | |
| 290 | + | |
274 | 291 | | |
275 | 292 | | |
276 | 293 | | |
| |||
322 | 339 | | |
323 | 340 | | |
324 | 341 | | |
325 | | - | |
326 | | - | |
327 | | - | |
328 | | - | |
329 | | - | |
330 | | - | |
331 | | - | |
| 342 | + | |
332 | 343 | | |
333 | 344 | | |
334 | 345 | | |
335 | 346 | | |
336 | 347 | | |
337 | 348 | | |
| 349 | + | |
338 | 350 | | |
339 | 351 | | |
340 | 352 | | |
| |||
407 | 419 | | |
408 | 420 | | |
409 | 421 | | |
| 422 | + | |
410 | 423 | | |
411 | 424 | | |
412 | 425 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
9 | 8 | | |
10 | 9 | | |
11 | 10 | | |
| |||
328 | 327 | | |
329 | 328 | | |
330 | 329 | | |
331 | | - | |
332 | | - | |
333 | | - | |
334 | | - | |
335 | | - | |
336 | | - | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
337 | 339 | | |
338 | | - | |
| 340 | + | |
339 | 341 | | |
340 | 342 | | |
341 | 343 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | | - | |
| 26 | + | |
27 | 27 | | |
28 | | - | |
29 | | - | |
30 | | - | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
37 | 40 | | |
38 | 41 | | |
39 | 42 | | |
| |||
0 commit comments