Skip to content

feat(backend/kernel): match Thrift backend's user-visible surface#793

Merged
vikrantpuppala merged 2 commits into
mainfrom
feat/kernel-backend-thrift-parity
May 19, 2026
Merged

feat(backend/kernel): match Thrift backend's user-visible surface#793
vikrantpuppala merged 2 commits into
mainfrom
feat/kernel-backend-thrift-parity

Conversation

@vikrantpuppala
Copy link
Copy Markdown
Contributor

Summary

Three small parity fixes that close 34 of the 66 diffs the Thrift vs Kernel comparator surfaced against Dogfood today (see databricks-driver-test#303 for the full diff catalog). None of these are data-correctness changes — they're contract-level differences that affect code reading the cursor's description, catching server errors by class, or calling get_columns with no catalog.

1. PEP-249 description.null_ok is always None (closes 29 diffs)

description_from_arrow_schema previously took null_ok from the Arrow field.nullable bit. The Thrift backend has always reported None for this slot (PEP 249 permits either). Hardcode None so the kernel backend is a drop-in for code that reads cursor.description[i][6].

2. SqlError maps to ServerOperationError (closes 3 diffs)

Server-side SQL failures (syntax error, missing object, etc.) used to wrap into the generic DatabaseError on the kernel backend. The Thrift backend raises ServerOperationError for the same shape. ServerOperationError is a DatabaseError subclass, so existing catches of the base class are unaffected — but code that catches the specific subclass now works equivalently.

3. get_columns(catalog_name=None) accepted (closes 2 diffs)

Previously rejected at the connector layer with a ProgrammingError. The kernel's list_columns now issues SHOW COLUMNS IN ALL CATALOGS server-side when catalog is None (per databricks-sql-kernel#33); the response carries catalogName per row so TABLE_CAT is correctly attributed without client-side enumeration. Matches Thrift's getColumns(null, …) behaviour.

Out of scope (separate PRs)

  • INTERVAL columns crashing on fetch (11 diffs) — needs a kernel-side post-decode cast since pyarrow.compute.cast(month_interval → utf8) is not implemented. Separate kernel PR.
  • _use_arrow_native_complex_types=False not honoured (2 diffs) — pre-existing gap shared with the native SEA backend; needs separate work that fixes both.

Test plan

  • python -m pytest tests/unit/ — 715 passed, 4 skipped (matches prior baseline; updated kernel-backend tests to reflect the new contract).
  • black src/databricks/sql/backend/kernel/ tests/unit/test_kernel_*.py clean.
  • Re-running the comparator end-to-end after merge to confirm the 34-diff drop.

This pull request and its description were written by Claude Code.

Three small parity fixes that close 34 of the 66 diffs the Thrift
vs Kernel comparator surfaced against Dogfood today (see
databricks-driver-test docs/comparator-thrift-vs-kernel-diffs.md).
None of these are user-data correctness — they're contract-level
differences that affect code reading the cursor's description,
catching server errors by class, or calling get_columns with no
catalog.

1. PEP-249 description.null_ok always None
   description_from_arrow_schema previously took null_ok from the
   Arrow field.nullable bit. The Thrift backend has always
   reported None for this slot (PEP 249 permits either). Hardcode
   None so the kernel backend is a drop-in for code that reads
   cursor.description[i][6].

2. SqlError maps to ServerOperationError
   Server-side SQL failures (syntax error, missing object, etc.)
   used to wrap into the generic DatabaseError on the kernel
   backend. The Thrift backend raises ServerOperationError for the
   same shape. ServerOperationError is a DatabaseError subclass,
   so existing catches of the base class are unaffected — but code
   that catches the specific subclass now works equivalently.

3. get_columns accepts catalog_name=None
   Previously rejected at the connector layer with a
   ProgrammingError. The kernel's list_columns now issues
   SHOW COLUMNS IN ALL CATALOGS server-side when catalog is None;
   the response carries catalogName per row so TABLE_CAT is
   correctly attributed without client-side enumeration. Matches
   Thrift's getColumns(null, ...) behaviour.

Two other comparator-surfaced gaps are out of scope:
- INTERVAL columns crashing on fetch (kernel-side fix; intervals
  need to be stringified post-decode).
- _use_arrow_native_complex_types=False not honoured (pre-existing
  gap shared with the native SEA backend; needs separate work).

Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
The kernel writes the server-reported type name into Arrow field
metadata under `databricks.type_name` (see
`databricks_sql_kernel::reader::metadata_keys::TYPE_NAME`).
`description_from_arrow_schema` now consults it and emits 'variant'
when the metadata says so, matching the Thrift backend.

Today only VARIANT needs this remap — every other precise type
either lands on a dedicated Arrow shape (INT, DECIMAL, …) or
collapses on both backends (INTERVAL_*, GEOMETRY, GEOGRAPHY).
Closes 3 PREPARED_STATEMENT_TYPES / COMPLEX_TYPES diffs in the
comparator.

Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Comment thread src/databricks/sql/backend/kernel/client.py
Comment thread src/databricks/sql/backend/kernel/type_mapping.py
Copy link
Copy Markdown
Contributor

@msrathore-db msrathore-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than the 2 comments

@vikrantpuppala vikrantpuppala enabled auto-merge (squash) May 19, 2026 10:35
@vikrantpuppala vikrantpuppala merged commit cdd869a into main May 19, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants