Extract SPOG org-id from cluster httpPath for non-Thrift requests#367
Extract SPOG org-id from cluster httpPath for non-Thrift requests#367msrathore-db wants to merge 2 commits into
Conversation
For all-purpose-compute Thrift connections on SPOG (custom-URL) hosts httpPath is /sql/protocolv1/o/<workspace-id>/<cluster-id> and the workspace ID is encoded in the path itself. PoPP routes the Thrift request correctly off the /o/<wsid>/ segment, so the connection succeeds without an explicit ?o= query parameter. Other clients on the same driver (telemetry pushes to /telemetry-ext, feature-flag fetches, ...) hit different paths that don't carry the workspace ID. Previously extractSpogHeaders only looked at ?o= in httpPath, so the x-databricks-org-id header was never set for cluster URLs without ?o=. On SPOG hosts PoPP then had no workspace context for these requests and redirected them to /login, silently dropping telemetry. Extend extractSpogHeaders to also extract the workspace ID from the cluster path segment via clusterPathOrgIDPattern as a fallback when ?o= is absent. Priority order is preserved: ?o= query param wins over the path-segment match. Adds four new test cases — cluster path without ?o=, leading-slash variant, ?o= wins precedence, and a warehouse-path regression guard so the new regex does not match warehouse paths. Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
|
🟠 P1 (Important — should fix) ?o= query value is injected into the header without digit validation — connector.go:172 🟡 P2 (Minor)
|
Summary
sql/protocolv1/o/<workspace-id>/<cluster-id>.extractSpogHeadersnow extracts the workspace ID from the/o/<wsid>/path segment as a fallback when?o=<wsid>is not present, and emits anx-databricks-org-idheader so the wrappedtelemetryClient/ feature-flag client can route correctly on SPOG.?o=inhttpPath▶/sql/protocolv1/o/<wsid>/path segment. Caller-set request headers still win, including mixed-caseX-Databricks-Org-Id.Why
On a SPOG host the workspace identity has to be in either the URL or the
x-databricks-org-idheader for PoPP to route a request to the correct workspace. For all-purpose cluster Thrift this is free — the workspace ID is in the/o/<wsid>/segment of httpPath, so PoPP routes Thrift viarouting_reason=workspace-idand the session opens fine without an explicit?o=.The telemetry and feature-flag transports built by
connector.Connectwrapc.clientwithwithSpogHeadersonly whenextractSpogHeaders(c.cfg.HTTPPath)returns a non-nil map. Before this PR that only happened when?o=was present, so cluster URLs without?o=produced nox-databricks-org-idheader, PoPP fell back to default (account) routing on/telemetry-ext, and the responses were 303 redirects to/login— silently dropping telemetry on SPOG.What changes
connector.goclusterPathOrgIDPatternregex ((?:^|/)sql/protocolv1/o/(\d+)/[^/?]+) compiled once at package init.extractSpogHeaderschecks?o=first (existing behavior), then falls back to the cluster path segment, then returns nil. The malformed-query-string path now also falls through to path inspection instead of returning early. Log messages indicate which source produced the workspace ID.regexpadded to imports.connector_spog_test.goTestExtractSpogHeaders:?o=→ header extracted from/o/<wsid>//→ same?o=→ query-param value wins?o=→ still nil (regression guard: the new regex must not match warehouse paths)X-Databricks-Org-Idis not overwritten by the SPOG wrapper.Test plan
go test -run 'TestExtractSpogHeaders|TestHeaderInjectingTransport' -v .— focused SPOG extraction and transport tests pass.go test -short .— root package suite passes.peco.azuredatabricks.net) all-purpose cluster: telemetry POST/telemetry-extflipped fromHTTP 303 → /logintoHTTP 200 OKonce the path-segment extraction populatedx-databricks-org-id. Same shape of fix here.Out of scope
databricks/databricks-sql-python) has a sibling upstream-head PR for the same fix: Extract SPOG org-id from cluster http_path for non-Thrift requests databricks-sql-python#817.databricks/databricks-sql-nodejs) already extracts org ID from both query param and path segment (seeextractWorkspaceIdinlib/DBSQLClient.ts).This pull request and its description were written with assistance from Claude Code.