Skip to content

Commit 3c3faf4

Browse files
authored
Merge branch 'main' into dsql-async-index
2 parents 91bb744 + 9833c03 commit 3c3faf4

26 files changed

Lines changed: 1450 additions & 166 deletions

AGENTS.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Extensible SQL Lexer and Parser for Rust Agents Guidelines
2+
3+
## General Agent Workflow
4+
1. You will write unit tests to ensure your code change is working as expected.
5+
2. You will run the commands in the Pre Commit Checks section below to ensure your change is ready for a pull request.
6+
3. When instructed to open a PR, you will follow the instructions in the Pull Request Guidelines section below.
7+
8+
## General Coding Guidelines
9+
1. Refrain from adding conditions on specific dialects, such as `dialect_is!(...)` or `dialect_of!(... | ...)`. Instead, define a new function in the `Dialect` trait that describes the condition, so that dialects can turn this condition on more easily.
10+
2. Make targeted code changes and refrain from refactoring, unless it's absolutely required.
11+
12+
## Unit Tests Guidelines
13+
- New unit tests should be added to the `tests` module in the corresponding dialect file (e.g., `tests/sqlparser_redshift.rs` for Redshift), and should be placed at the end of the file.
14+
- If the new functionality is gated using a dialect function, and the SQL is likely relevant in most dialects, tests should be placed under `tests/sqlparser_common.rs`.
15+
- When testing a multi-line SQL statement, use a raw string literal, i.e. `r#"..."#` to preserve formatting.
16+
- The parser builds an abstract syntax tree (AST) from the SQL statement and has functionality to display the tree as SQL. Use the following template for simple unit tests where you expect the SQL created from the AST to be the same as the input SQL:
17+
```rust
18+
<dialect>().verified_stmt(r#"..."#);
19+
```
20+
For example: `snowflake().verified_stmt(r#"SELECT * FROM my_table"#)`. Use `one_statement_parses_to` instead of `verified_stmt` when you expect the SQL created by the AST to differ than the input SQL. For example:
21+
```rust
22+
snowflake().one_statement_parses_to(
23+
"SELECT * FROM my_table t",
24+
"SELECT * FROM my_table AS t",
25+
)
26+
```
27+
28+
## Analyzing Parsing Issues
29+
You can try to simplify the SQL statement to identify the root cause of the parsing issue. This may involve removing certain clauses or components of the SQL statement to see if it can be parsed successfully. Additionally, you can compare the problematic SQL statement with similar statements that are parsed correctly to identify any differences that may be causing the issue.
30+
31+
## Pre Commit Checks
32+
Run the following commands before you commit to ensure the change will pass the CI process:
33+
```bash
34+
cargo test --all-features
35+
cargo fmt --all
36+
cargo clippy --all-targets --all-features -- -D warnings
37+
```
38+
39+
## Pull Request Guidelines
40+
1. PR title should follow this format: `<DIALECT>: <SHORT DESCRIPTION>`, For example, `Showflake: Add support for casting to VARIANT`.
41+
2. Make the PR comment short, provide an example of what was not working and a short description of the fix. Be succint.

dev/release/rat_exclude_files.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,4 @@ dev/release/rat_exclude_files.txt
66
sqlparser_bench/img/flamegraph.svg
77
**Cargo.lock
88
filtered_rat.txt
9+
AGENTS.md

src/ast/ddl.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3207,6 +3207,7 @@ impl fmt::Display for CreateTable {
32073207
Some(HiveIOFormat::FileFormat { format }) if !self.external => {
32083208
write!(f, " STORED AS {format}")?
32093209
}
3210+
Some(HiveIOFormat::Using { format }) => write!(f, " USING {format}")?,
32103211
_ => (),
32113212
}
32123213
if let Some(serde_properties) = serde_properties.as_ref() {

src/ast/mod.rs

Lines changed: 73 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ use helpers::{
3030
};
3131

3232
use core::cmp::Ordering;
33-
use core::ops::Deref;
33+
use core::ops::{Deref, DerefMut};
3434
use core::{
3535
fmt::{self, Display},
3636
hash,
@@ -200,6 +200,45 @@ fn format_statement_list(f: &mut fmt::Formatter, statements: &[Statement]) -> fm
200200
write!(f, ";")
201201
}
202202

203+
/// A item `T` enclosed in a pair of parentheses
204+
#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)]
205+
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
206+
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
207+
pub struct Parens<T> {
208+
/// the opening parenthesis token, i.e. `(`
209+
pub opening_token: AttachedToken,
210+
/// content enclosed in parentheses
211+
pub content: T,
212+
/// the closing parenthesis token, i.e. `)`
213+
pub closing_token: AttachedToken,
214+
}
215+
216+
impl<T> Parens<T> {
217+
/// Constructor wrapping `content` into `Parens` with an empty span;
218+
/// useful for testing purposes.
219+
pub fn with_empty_span(content: T) -> Self {
220+
Self {
221+
opening_token: AttachedToken::empty(),
222+
content,
223+
closing_token: AttachedToken::empty(),
224+
}
225+
}
226+
}
227+
228+
impl<T> Deref for Parens<T> {
229+
type Target = T;
230+
231+
fn deref(&self) -> &Self::Target {
232+
&self.content
233+
}
234+
}
235+
236+
impl<T> DerefMut for Parens<T> {
237+
fn deref_mut(&mut self) -> &mut Self::Target {
238+
&mut self.content
239+
}
240+
}
241+
203242
/// An identifier, decomposed into its value or character data and the quote style.
204243
#[derive(Debug, Clone)]
205244
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
@@ -4109,6 +4148,17 @@ pub enum Statement {
41094148
show_options: ShowStatementOptions,
41104149
},
41114150
/// ```sql
4151+
/// SHOW CATALOGS
4152+
/// ```
4153+
ShowCatalogs {
4154+
/// `true` when terse output format was requested.
4155+
terse: bool,
4156+
/// `true` when history information was requested.
4157+
history: bool,
4158+
/// Additional options for `SHOW CATALOGS`.
4159+
show_options: ShowStatementOptions,
4160+
},
4161+
/// ```sql
41124162
/// SHOW DATABASES
41134163
/// ```
41144164
ShowDatabases {
@@ -5744,6 +5794,19 @@ impl fmt::Display for Statement {
57445794
)?;
57455795
Ok(())
57465796
}
5797+
Statement::ShowCatalogs {
5798+
terse,
5799+
history,
5800+
show_options,
5801+
} => {
5802+
write!(
5803+
f,
5804+
"SHOW {terse}CATALOGS{history}{show_options}",
5805+
terse = if *terse { "TERSE " } else { "" },
5806+
history = if *history { " HISTORY" } else { "" },
5807+
)?;
5808+
Ok(())
5809+
}
57475810
Statement::ShowProcessList { full } => {
57485811
write!(
57495812
f,
@@ -8634,6 +8697,15 @@ pub enum HiveIOFormat {
86348697
/// The file format used for storage.
86358698
format: FileFormat,
86368699
},
8700+
/// `USING <format>` syntax used by Spark SQL.
8701+
///
8702+
/// Example: `CREATE TABLE t (i INT) USING PARQUET`
8703+
///
8704+
/// See <https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-create-table-datasource.html>
8705+
Using {
8706+
/// The data source or format name, e.g. `parquet`, `delta`, `csv`.
8707+
format: Ident,
8708+
},
86378709
}
86388710

86398711
#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash, Default)]

src/ast/query.rs

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1522,6 +1522,8 @@ pub enum TableFactor {
15221522
name: ObjectName,
15231523
/// Arguments passed to the function.
15241524
args: Vec<FunctionArg>,
1525+
/// Whether `WITH ORDINALITY` was specified to include ordinality.
1526+
with_ordinality: bool,
15251527
/// Optional alias for the result of the function.
15261528
alias: Option<TableAlias>,
15271529
},
@@ -2277,13 +2279,17 @@ impl fmt::Display for TableFactor {
22772279
lateral,
22782280
name,
22792281
args,
2282+
with_ordinality,
22802283
alias,
22812284
} => {
22822285
if *lateral {
22832286
write!(f, "LATERAL ")?;
22842287
}
22852288
write!(f, "{name}")?;
22862289
write!(f, "({})", display_comma_separated(args))?;
2290+
if *with_ordinality {
2291+
write!(f, " WITH ORDINALITY")?;
2292+
}
22872293
if let Some(alias) = alias {
22882294
write!(f, " {alias}")?;
22892295
}
@@ -2527,6 +2533,12 @@ pub struct TableAlias {
25272533
pub name: Ident,
25282534
/// Optional column aliases declared in parentheses after the table alias.
25292535
pub columns: Vec<TableAliasColumnDef>,
2536+
/// Optional PartiQL index alias declared with `AT`. For example:
2537+
/// ```sql
2538+
/// SELECT element, index FROM bar AS b, b.data.scalar_array AS element AT index
2539+
/// ```
2540+
/// See: <https://docs.aws.amazon.com/redshift/latest/dg/query-super.html>
2541+
pub at: Option<Ident>,
25302542
}
25312543

25322544
impl fmt::Display for TableAlias {
@@ -2535,6 +2547,9 @@ impl fmt::Display for TableAlias {
25352547
if !self.columns.is_empty() {
25362548
write!(f, " ({})", display_comma_separated(&self.columns))?;
25372549
}
2550+
if let Some(at) = &self.at {
2551+
write!(f, " AT {at}")?;
2552+
}
25382553
Ok(())
25392554
}
25402555
}
@@ -2770,6 +2785,13 @@ impl fmt::Display for Join {
27702785
self.relation,
27712786
suffix(constraint)
27722787
)),
2788+
JoinOperator::ArrayJoin => f.write_fmt(format_args!("ARRAY JOIN {}", self.relation)),
2789+
JoinOperator::LeftArrayJoin => {
2790+
f.write_fmt(format_args!("LEFT ARRAY JOIN {}", self.relation))
2791+
}
2792+
JoinOperator::InnerArrayJoin => {
2793+
f.write_fmt(format_args!("INNER ARRAY JOIN {}", self.relation))
2794+
}
27732795
}
27742796
}
27752797
}
@@ -2824,6 +2846,14 @@ pub enum JoinOperator {
28242846
///
28252847
/// See <https://dev.mysql.com/doc/refman/8.4/en/join.html>.
28262848
StraightJoin(JoinConstraint),
2849+
/// ClickHouse: `ARRAY JOIN` for unnesting arrays inline.
2850+
///
2851+
/// See <https://clickhouse.com/docs/en/sql-reference/statements/select/array-join>.
2852+
ArrayJoin,
2853+
/// ClickHouse: `LEFT ARRAY JOIN` for unnesting arrays inline (preserves rows with empty arrays).
2854+
LeftArrayJoin,
2855+
/// ClickHouse: `INNER ARRAY JOIN` for unnesting arrays inline (filters rows with empty arrays).
2856+
InnerArrayJoin,
28272857
}
28282858

28292859
#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)]
@@ -3628,7 +3658,7 @@ pub struct Values {
36283658
/// <https://dev.mysql.com/doc/refman/9.2/en/insert.html>
36293659
pub value_keyword: bool,
36303660
/// The list of rows, each row is a list of expressions.
3631-
pub rows: Vec<Vec<Expr>>,
3661+
pub rows: Vec<Parens<Vec<Expr>>>,
36323662
}
36333663

36343664
impl fmt::Display for Values {

src/ast/spans.rs

Lines changed: 29 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -41,13 +41,13 @@ use super::{
4141
MatchRecognizePattern, Measure, Merge, MergeAction, MergeClause, MergeInsertExpr,
4242
MergeInsertKind, MergeUpdateExpr, NamedParenthesizedList, NamedWindowDefinition, ObjectName,
4343
ObjectNamePart, Offset, OnConflict, OnConflictAction, OnInsert, OpenStatement, OrderBy,
44-
OrderByExpr, OrderByKind, OutputClause, Partition, PartitionBoundValue, PivotValueSource,
45-
ProjectionSelect, Query, RaiseStatement, RaiseStatementValue, ReferentialAction,
46-
RenameSelectItem, ReplaceSelectElement, ReplaceSelectItem, Select, SelectInto, SelectItem,
47-
SetExpr, SqlOption, Statement, Subscript, SymbolDefinition, TableAlias, TableAliasColumnDef,
48-
TableConstraint, TableFactor, TableObject, TableOptionsClustered, TableWithJoins, Update,
49-
UpdateTableFromKind, Use, Values, ViewColumnDef, WhileStatement, WildcardAdditionalOptions,
50-
With, WithFill,
44+
OrderByExpr, OrderByKind, OutputClause, Parens, Partition, PartitionBoundValue,
45+
PivotValueSource, ProjectionSelect, Query, RaiseStatement, RaiseStatementValue,
46+
ReferentialAction, RenameSelectItem, ReplaceSelectElement, ReplaceSelectItem, Select,
47+
SelectInto, SelectItem, SetExpr, SqlOption, Statement, Subscript, SymbolDefinition, TableAlias,
48+
TableAliasColumnDef, TableConstraint, TableFactor, TableObject, TableOptionsClustered,
49+
TableWithJoins, Update, UpdateTableFromKind, Use, Values, ViewColumnDef, WhileStatement,
50+
WildcardAdditionalOptions, With, WithFill,
5151
};
5252

5353
/// Given an iterator of spans, return the [Span::union] of all spans.
@@ -106,6 +106,12 @@ impl Spanned for TokenWithSpan {
106106
}
107107
}
108108

109+
impl<T> Spanned for Parens<T> {
110+
fn span(&self) -> Span {
111+
self.opening_token.0.span.union(&self.closing_token.0.span)
112+
}
113+
}
114+
109115
impl Spanned for Query {
110116
fn span(&self) -> Span {
111117
let Query {
@@ -239,10 +245,11 @@ impl Spanned for Values {
239245
rows,
240246
} = self;
241247

242-
union_spans(
243-
rows.iter()
244-
.map(|row| union_spans(row.iter().map(|expr| expr.span()))),
245-
)
248+
match &rows[..] {
249+
[] => Span::empty(),
250+
[f] => f.span(),
251+
[f, .., l] => f.span().union(&l.span()),
252+
}
246253
}
247254
}
248255

@@ -478,6 +485,7 @@ impl Spanned for Statement {
478485
Statement::AlterConnector { .. } => Span::empty(),
479486
Statement::DropPolicy { .. } => Span::empty(),
480487
Statement::DropConnector { .. } => Span::empty(),
488+
Statement::ShowCatalogs { .. } => Span::empty(),
481489
Statement::ShowDatabases { .. } => Span::empty(),
482490
Statement::ShowProcessList { .. } => Span::empty(),
483491
Statement::ShowSchemas { .. } => Span::empty(),
@@ -1990,6 +1998,7 @@ impl Spanned for TableFactor {
19901998
lateral: _,
19911999
name,
19922000
args,
2001+
with_ordinality: _,
19932002
alias,
19942003
} => union_spans(
19952004
name.0
@@ -2178,8 +2187,13 @@ impl Spanned for TableAlias {
21782187
explicit: _,
21792188
name,
21802189
columns,
2190+
at,
21812191
} = self;
2182-
union_spans(core::iter::once(name.span).chain(columns.iter().map(Spanned::span)))
2192+
union_spans(
2193+
core::iter::once(name.span)
2194+
.chain(columns.iter().map(Spanned::span))
2195+
.chain(at.iter().map(|at| at.span)),
2196+
)
21832197
}
21842198
}
21852199

@@ -2239,6 +2253,9 @@ impl Spanned for JoinOperator {
22392253
JoinOperator::Anti(join_constraint) => join_constraint.span(),
22402254
JoinOperator::Semi(join_constraint) => join_constraint.span(),
22412255
JoinOperator::StraightJoin(join_constraint) => join_constraint.span(),
2256+
JoinOperator::ArrayJoin => Span::empty(),
2257+
JoinOperator::LeftArrayJoin => Span::empty(),
2258+
JoinOperator::InnerArrayJoin => Span::empty(),
22422259
}
22432260
}
22442261
}

src/dialect/clickhouse.rs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,14 @@ impl Dialect for ClickHouseDialect {
6464
true
6565
}
6666

67+
fn supports_partition_by_after_order_by(&self) -> bool {
68+
true
69+
}
70+
71+
fn supports_array_join_syntax(&self) -> bool {
72+
true
73+
}
74+
6775
// ClickHouse uses this for some FORMAT expressions in `INSERT` context, e.g. when inserting
6876
// with FORMAT JSONEachRow a raw JSON key-value expression is valid and expected.
6977
//

src/dialect/generic.rs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,14 @@ impl Dialect for GenericDialect {
4545
true
4646
}
4747

48+
fn supports_partition_by_after_order_by(&self) -> bool {
49+
true
50+
}
51+
52+
fn supports_array_join_syntax(&self) -> bool {
53+
true
54+
}
55+
4856
fn supports_group_by_expr(&self) -> bool {
4957
true
5058
}

src/dialect/hive.rs

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,4 +72,11 @@ impl Dialect for HiveDialect {
7272
fn supports_group_by_with_modifier(&self) -> bool {
7373
true
7474
}
75+
76+
// TODO: The parsing of the FROM keyword seems wrong, as it happens within the CTE.
77+
// See https://github.com/apache/datafusion-sqlparser-rs/issues/2236 for more details.
78+
/// See <https://hive.apache.org/docs/latest/language/common-table-expression/>
79+
fn supports_from_first_insert(&self) -> bool {
80+
true
81+
}
7582
}

0 commit comments

Comments
 (0)