Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 69 additions & 0 deletions docs/database-query-observability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Database query observability

Maple gives every database call first-class treatment using the standard
[OpenTelemetry database semantic conventions](https://opentelemetry.io/docs/specs/semconv/db/database-spans/).
If your services are instrumented with an OTel-aware database client — which most
SDKs enable automatically — you get, with **no extra configuration**:

- **Query timing inline in every trace.** A database call's client span shows up
in the waterfall like any other span, and its detail panel renders a database
summary block (system, namespace, table, operation, rows returned, server)
derived from the `db.*` attributes.
- **A cross-service Queries surface.** Every distinct query *shape* (the query
with literals normalized to `?`) is aggregated across services with call
volume, error rate, and p50/p95/p99 latency, so you can find your slowest and
busiest queries and drill straight to sample traces.

This works for **any** database — PostgreSQL, MySQL, ClickHouse, Redis, MongoDB,
and more — because it reads only the vendor-neutral semantic conventions.

## Attributes Maple reads

| Attribute | Used for |
| --- | --- |
| `db.system.name` (legacy `db.system`) | Identifies the database; drives the summary block and per-system grouping. |
| `db.query.text` (legacy `db.statement`) | The query; normalized into a low-cardinality **shape** for grouping. |
| `db.query.summary` | Preferred human label for a query shape (e.g. `SELECT users`). |
| `db.operation.name`, `db.collection.name`, `db.namespace` | Compose a label when `db.query.summary` is absent. |
| `db.query.fingerprint` (legacy `db.statement.fingerprint`) | Explicit grouping key when the instrumentation provides one. |
| `db.response.returned_rows` | Rows returned, shown in the span summary. |
| `db.operation.batch.size` | Batch size (only present for batches). |
| `server.address` / `server.port` | The database endpoint. |
| `error.type`, `db.response.status_code` | Failure outcome. |

Query text is grouped by *shape*: literals are stripped to `?` and `IN (...)`
lists are collapsed, so `WHERE id = 1` and `WHERE id = 2` are the same shape.
Prefer emitting parameterized `db.query.text` (the OTel spec says parameterized
text should **not** be sanitized) so shapes stay clean.

## Correlating server-side query logs with traces (SQLCommenter)

The client span above captures the query *as the caller sees it* — duration and
the query text — but it cannot see server-side detail such as memory used or
rows/bytes scanned. To bridge that gap, tag your queries with **SQLCommenter**,
the OpenTelemetry-standard way to propagate trace context into the database by
appending a comment to the query:

```sql
SELECT * FROM events WHERE ts > ? /*traceparent='00-<trace_id>-<span_id>-01'*/
```

Most OTel database instrumentations can inject this for you (it is opt-in — see
your SDK's SQLCommenter / "DB statement comment" option). Because the database
records the full query text — comment included — in its query log, Maple can
read that log back (see the ClickHouse integration) and stitch each server-side
query to the exact client span that issued it, nesting it as a child in the
trace.

> Note: SQLCommenter comments are low-cardinality-unfriendly for MySQL prepared
> statements, Oracle, and SQL Server; consult the OTel guidance before enabling
> it broadly on those engines.

## Resource allocation (ClickHouse)

Server-side **resource allocation** — peak memory, rows/bytes read, CPU time,
ProfileEvents — is not available from client spans. For ClickHouse, connect your
cluster via the ClickHouse integration: Maple polls `system.query_log`, forwards
each sampled query into Maple as a span (nested under your app's trace via the
SQLCommenter `traceparent` above) plus aggregate metrics, so query timing and
resource allocation land alongside your existing traces and dashboards.
1 change: 1 addition & 0 deletions packages/domain/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
"./primitives": "./src/primitives.ts",
"./query-engine": "./src/query-engine.ts",
"./recommendations": "./src/recommendations.ts",
"./sqlcommenter": "./src/sqlcommenter.ts",
"./tinybird-project-sync": "./src/tinybird/project-sync.ts",
"./warehouse-queries": "./src/warehouse-queries.ts",
"./tinybird": "./src/tinybird/index.ts",
Expand Down
53 changes: 53 additions & 0 deletions packages/domain/src/sqlcommenter.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
import { describe, expect, it } from "vitest"
import { parseSqlCommenterTraceparent } from "./sqlcommenter"

const TRACE_ID = "0af7651916cd43dd8448eb211c80319c"
const SPAN_ID = "b7ad6b7169203331"

describe("parseSqlCommenterTraceparent", () => {
it("extracts trace context from a trailing SQLCommenter comment", () => {
const sql = `SELECT * FROM songs WHERE id = ? /*traceparent='00-${TRACE_ID}-${SPAN_ID}-01'*/`
expect(parseSqlCommenterTraceparent(sql)).toEqual({
traceId: TRACE_ID,
spanId: SPAN_ID,
flags: "01",
sampled: true,
})
})

it("reads the unsampled flag (00)", () => {
const sql = `SELECT 1 /*traceparent='00-${TRACE_ID}-${SPAN_ID}-00'*/`
expect(parseSqlCommenterTraceparent(sql)?.sampled).toBe(false)
})

it("finds the comment alongside other sqlcommenter keys", () => {
const sql = `SELECT 1 /*db_driver='clickhouse',traceparent='00-${TRACE_ID}-${SPAN_ID}-01',route='%2Fusers'*/`
expect(parseSqlCommenterTraceparent(sql)?.traceId).toBe(TRACE_ID)
})

it("is tolerant of a URL-encoded value, extra whitespace, and uppercase hex", () => {
const sql = `SELECT 1 /* traceparent = '00-${TRACE_ID.toUpperCase()}-${SPAN_ID.toUpperCase()}-01' */`
expect(parseSqlCommenterTraceparent(sql)?.spanId).toBe(SPAN_ID)
})

it("returns null when there is no traceparent comment", () => {
expect(parseSqlCommenterTraceparent("SELECT * FROM songs")).toBeNull()
expect(parseSqlCommenterTraceparent("")).toBeNull()
expect(parseSqlCommenterTraceparent(null)).toBeNull()
expect(parseSqlCommenterTraceparent(undefined)).toBeNull()
})

it("rejects a malformed traceparent (wrong lengths / all-zero ids)", () => {
expect(parseSqlCommenterTraceparent("SELECT 1 /*traceparent='00-tooshort-abc-01'*/")).toBeNull()
expect(
parseSqlCommenterTraceparent(
`SELECT 1 /*traceparent='00-${"0".repeat(32)}-${SPAN_ID}-01'*/`,
),
).toBeNull()
expect(
parseSqlCommenterTraceparent(
`SELECT 1 /*traceparent='00-${TRACE_ID}-${"0".repeat(16)}-01'*/`,
),
).toBeNull()
})
})
73 changes: 73 additions & 0 deletions packages/domain/src/sqlcommenter.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
/**
* SQLCommenter (https://google.github.io/sqlcommenter/) trace-context parsing.
*
* SQLCommenter — now merged into the OpenTelemetry specification as the standard
* way to correlate database queries with APM traces — propagates trace context
* into a database by appending a machine-readable comment to the query text:
*
* SELECT * FROM songs WHERE id = ? /​*traceparent='00-<trace_id>-<span_id>-01'*​/
*
* The database records the full query (comment included) in its query log
* (e.g. ClickHouse `system.query_log`), so reading that log back lets us stitch
* a server-side query row to the client span that issued it — nesting the
* server-side query as a child of the app's DB span.
*
* This module extracts the W3C `traceparent` from such a comment. Pure string
* parsing, no imports, so it is safe to pull into the web / cli / scraper
* bundles alike.
*/

/** The W3C trace-context fields carried by a `traceparent`. */
export interface Traceparent {
/** 32-hex-char trace id (lowercase). */
readonly traceId: string
/** 16-hex-char parent span id (lowercase). */
readonly spanId: string
/** 2-hex-char trace-flags byte (e.g. "01"). */
readonly flags: string
/** Whether the `sampled` flag (bit 0 of trace-flags) is set. */
readonly sampled: boolean
}

// version "-" trace-id "-" span-id "-" trace-flags, all lowercase hex.
const TRACEPARENT_RE = /^([0-9a-f]{2})-([0-9a-f]{32})-([0-9a-f]{16})-([0-9a-f]{2})$/

// Pull `traceparent='<value>'` out of a SQLCommenter comment. Per the spec the
// value is URL-encoded and single-quoted; accept double quotes defensively.
const COMMENT_VALUE_RE = /traceparent\s*=\s*(['"])([^'"]+)\1/i

/** An all-zero id is invalid per the W3C spec — treat it as absent. */
const isAllZero = (hex: string): boolean => /^0+$/.test(hex)

/**
* Extract the W3C `traceparent` from a SQLCommenter comment embedded anywhere in
* `sql`. Returns `null` when absent or malformed (all-zero ids counted as
* malformed). The parse is case- and whitespace-tolerant and URL-decodes the
* value defensively.
*/
export function parseSqlCommenterTraceparent(sql: string | null | undefined): Traceparent | null {
if (!sql) return null

const commentMatch = COMMENT_VALUE_RE.exec(sql)
if (!commentMatch) return null

let raw = commentMatch[2]
try {
raw = decodeURIComponent(raw)
} catch {
// Leave `raw` as-is when it isn't valid percent-encoding.
}

const parts = TRACEPARENT_RE.exec(raw.trim().toLowerCase())
if (!parts) return null

const [, , traceId, spanId, flags] = parts
if (isAllZero(traceId) || isAllZero(spanId)) return null

return {
traceId,
spanId,
flags,
sampled: (Number.parseInt(flags, 16) & 0x01) === 1,
}
}
7 changes: 7 additions & 0 deletions packages/query-engine/src/ch/queries/errors.ts
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,13 @@ const TREE_SPAN_ATTR_KEYS = [
"cache.name",
"cache.operation",
"cache.lookup_performed",
// Generic OpenTelemetry database-client spans — the `db.system.name` signal
// (with the legacy `db.system` fallback) lets the trace views detect a DB
// span and render its summary badge without waiting for the per-span lazy
// detail fetch. The full `db.*` field set (namespace, operation, rows,
// server, …) is loaded lazily by `spanDetailQuery` for the detail panel.
"db.system.name",
"db.system",
// Cloudflare Workers Observability — read by `getCloudflareInfo` to mark
// Worker spans and render the edge-location + outcome badges in the tree
// views. The full set (ray id, cpu/wall time, script version, geo city) is
Expand Down
45 changes: 43 additions & 2 deletions packages/ui/src/lib/__tests__/cloud-platforms.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -75,12 +75,11 @@ describe("getCloudPlatform — cloudflare", () => {
expect(byLabel["Handler"]).toBe("queue")
})

it("returns null for a non-platform span", () => {
it("returns null for a non-platform span (no cloud, no db)", () => {
expect(
getCloudPlatform({
"http.method": "GET",
"http.route": "/v1/spans",
"db.system": "postgresql",
}),
).toBeNull()
})
Expand All @@ -101,6 +100,48 @@ describe("getCloudPlatform — cloudflare", () => {
})
})

describe("getCloudPlatform — database", () => {
it("normalizes a DB-client span and humanizes db.system.name", () => {
const info = getCloudPlatform({
"db.system.name": "postgresql",
"db.namespace": "app",
"db.collection.name": "users",
"db.operation.name": "SELECT",
"db.response.returned_rows": "42",
"server.address": "db.internal",
"server.port": "5432",
})
expect(info?.id).toBe("database")
expect(info?.label).toBe("PostgreSQL")
expect(info?.outcome).toBeNull()
const byLabel = Object.fromEntries(info!.fields.map((f) => [f.label, f.value]))
expect(byLabel["Operation"]).toBe("SELECT")
expect(byLabel["Namespace"]).toBe("app")
expect(byLabel["Table"]).toBe("users")
expect(byLabel["Rows returned"]).toBe("42")
expect(byLabel["Server"]).toBe("db.internal:5432")
})

it("falls back to the legacy db.system and title-cases unknown systems", () => {
expect(getCloudPlatform({ "db.system": "clickhouse" })?.label).toBe("ClickHouse")
expect(getCloudPlatform({ "db.system.name": "microsoft.sql_server" })?.label).toBe("SQL Server")
expect(getCloudPlatform({ "db.system.name": "cockroachdb" })?.label).toBe("CockroachDB")
})

it("flags error.type as a bad outcome", () => {
const info = getCloudPlatform({ "db.system.name": "mysql", "error.type": "timeout" })
expect(info?.outcome).toEqual({ value: "timeout", bad: true })
})

// Same projected-map regression as cloudflare: an empty db.system value must
// not flag every span as a database call.
it("returns null when db.system keys are present but empty (projected map)", () => {
expect(
getCloudPlatform({ "db.system.name": "", "db.system": "", "http.route": "/checkout" }),
).toBeNull()
})
})

describe("outcomeBadgeStyle", () => {
it("styles ok vs failure outcomes differently", () => {
expect(outcomeBadgeStyle(false)).toContain("severity-info")
Expand Down
Loading
Loading