Skip to content

Feature request: support PostgreSQL $N and SQLite :name placeholder syntaxes #263

@aleks-f

Description

@aleks-f

Background

The parser currently recognises a single placeholder form, ? (positional, sequential). Most SQL dialects that use this parser as a frontend also accept one or both of these in production queries:

  • PostgreSQL uses $N with N a positive integer — WHERE a = $1 AND b = $2. Order of appearance does not determine binding; the explicit number does.
  • SQLite supports :nameWHERE a = :user AND b = :id. Bound by name.

Anyone embedding hyrise/sql-parser into a real database driver currently has to either pre-translate queries before parsing or work around the limitation downstream. Adding native support inside the parser is small, isolated, and unblocks those use cases without changing existing ? behaviour.

Proposed grammar

Two additional rules in the lexer, two additional alternatives in param_expr:

# flex_lexer.l (before the punctuation catch-all)
\$[0-9]+                 -> DOLLAR_PARAM   (ival = atoll(yytext + 1))
:[A-Za-z][A-Za-z0-9_]*   -> NAMED_PARAM    (sval = strdup(yytext + 1))
# bison_parser.y
param_expr
  : '?'           { /* existing */ }
  | DOLLAR_PARAM  { if ($1 < 1) YYERROR;            // reject $0
                    $$ = Expr::makeDollarParameter($1); ... }
  | NAMED_PARAM   { $$ = Expr::makeNamedParameter($1); ... }
  ;

The identifier pattern matches the existing SQL_IDENTIFIER rule so :foo, :abc_123 parse and :_x does not — same surface as everywhere else in the grammar.

Conflict analysis

  • $ is not currently a lexer token. No collision.
  • : is in the punctuation character class but never referenced by any grammar rule (zero matches for ':' in bison_parser.y); the standalone : token remains defined but unreachable. There is no :: cast syntax in the grammar, so :identifier does not interfere with anything.
  • bison -v reports no new shift/reduce or reduce/reduce conflicts on the resulting grammar (verified locally).

Proposed AST

Three ExprType values rather than one with a discriminator field:

enum ExprType {
  ...
  kExprParameter,         // ?
  kExprParameterDollar,   // $N — ival holds N (1-based, preserves user intent)
  kExprParameterNamed,    // :name — name holds the identifier
  ...
};

static Expr* Expr::makeDollarParameter(int64_t n);
static Expr* Expr::makeNamedParameter(char* name);

The top-level input rule's renumber loop only touches kExprParameter so ? retains its current 0-based sequential ival semantics; $N keeps its explicit N (consistent with PostgreSQL's contract); :name is bound by name and its ival is not meaningful.

The reason for three distinct enum values rather than overloading one: future consumers (binders, AST printers, query rewriters) get an explicit switch dispatch instead of having to remember to check whether name is null. It also makes round-trip printing through sqlhelper straightforward.

Backward compatibility

  • The ? path is unchanged at every layer — lexer rule, parser action, AST, SQLParserResult::parameters() order.
  • Existing tests under test/prepare_tests.cpp keep passing untouched.
  • The two new ExprType values are appended after kExprParameter, so no ordinal change.
  • Mixing styles in one statement (e.g. WHERE a = ? AND b = $1) parses successfully. We took the position that policing the mix belongs in the driver, not the parser.

Tests

The change comes with:

  • Three new cases in test/prepare_tests.cpp covering $N, $N declared out of order, and :name.
  • Three new good queries and two new bad queries ($0, lone $) in test/queries/.
  • make test passes all three checks (SQL tests, valgrind, grammar conflict).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions