Skip to content

Commit 5b6def4

Browse files
authored
Merge pull request #3 from bitofsky/add/external-links-chunks-and-tests
Update API behavior and integration tests
2 parents 4c23ce2 + 1c8ddfa commit 5b6def4

15 files changed

Lines changed: 841 additions & 387 deletions

README.md

Lines changed: 23 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ The goal is simple: stream big results with stable memory usage and without forc
2323
- Optimized polling with server-side wait (up to 50s) before falling back to client polling.
2424
- Query metrics support via Query History API (`enableMetrics` option).
2525
- Efficient external link handling: merge chunks into a single stream.
26+
- Handles partial external link responses by fetching missing chunk metadata.
2627
- `mergeExternalLinks` supports streaming uploads and returns a new StatementResult with a presigned URL.
2728
- `fetchRow`/`fetchAll` support `JSON_OBJECT` (schema-based row mapping).
2829
- External links + JSON_ARRAY are supported for row iteration (streaming JSON parsing).
@@ -48,11 +49,12 @@ console.log(rows) // [{ value: 1 }]
4849
```
4950

5051
## Sample (Streaming + Presigned URL)
51-
Stream external links into S3 with gzip compression, then return a single presigned URL:
52+
Stream external links into S3 with gzip compression, then return a single presigned URL.
5253

5354
```ts
5455
import { executeStatement, mergeExternalLinks } from '@bitofsky/databricks-sql'
55-
import { GetObjectCommand, HeadObjectCommand, PutObjectCommand, S3Client } from '@aws-sdk/client-s3'
56+
import { GetObjectCommand, HeadObjectCommand, S3Client } from '@aws-sdk/client-s3'
57+
import { Upload } from '@aws-sdk/lib-storage'
5658
import { getSignedUrl } from '@aws-sdk/s3-request-presigner'
5759
import { createGzip } from 'zlib'
5860
import { pipeline } from 'stream/promises'
@@ -79,15 +81,17 @@ const merged = await mergeExternalLinks(result, auth, {
7981
const gzip = createGzip() // Compress with gzip and upload to S3
8082
const passThrough = new PassThrough()
8183

82-
const uploadPromise = s3.send(
83-
new PutObjectCommand({
84+
const upload = new Upload({
85+
client: s3,
86+
params: {
8487
Bucket: bucket,
8588
Key: key,
8689
Body: passThrough,
87-
ContentType: 'text/csv',
90+
ContentType: 'text/csv; charset=utf-8',
8891
ContentEncoding: 'gzip',
89-
})
90-
)
92+
},
93+
})
94+
const uploadPromise = upload.done()
9195

9296
await Promise.all([
9397
pipeline(stream, gzip, passThrough),
@@ -128,8 +132,8 @@ const result = await executeStatement(
128132
auth,
129133
{
130134
enableMetrics: true,
131-
onProgress: (status, metrics) => {
132-
console.log(`State: ${status.state}`)
135+
onProgress: (result, metrics) => {
136+
console.log(`State: ${result.status.state}`)
133137
if (metrics) { // metrics is optional, only present when enableMetrics: true
134138
console.log(` Execution time: ${metrics.execution_time_ms}ms`)
135139
console.log(` Rows produced: ${metrics.rows_produced_count}`)
@@ -190,6 +194,7 @@ function executeStatement(
190194
```
191195
- Calls the Databricks Statement Execution API and polls until completion.
192196
- Server waits up to 50s (`wait_timeout`) before client-side polling begins.
197+
- Default `wait_timeout` is `50s`, or `0s` when `onProgress` is provided.
193198
- Use `options.onProgress` to receive status updates with optional metrics.
194199
- Set `enableMetrics: true` to fetch query metrics from Query History API on each poll.
195200
- Throws `DatabricksSqlError` on failure, `StatementCancelledError` on cancel, and `AbortError` on abort.
@@ -205,6 +210,7 @@ function fetchRow(
205210
- Streams each row to `options.onEachRow`.
206211
- Use `format: 'JSON_OBJECT'` to map rows into schema-based objects.
207212
- Supports `INLINE` results or `JSON_ARRAY` formatted `EXTERNAL_LINKS` only.
213+
- If only a subset of external links is returned, missing chunk metadata is fetched by index.
208214

209215
### fetchAll(statementResult, auth, options?)
210216
```ts
@@ -216,6 +222,7 @@ function fetchAll(
216222
```
217223
- Collects all rows into an array. For large results, prefer `fetchRow`/`fetchStream`.
218224
- Supports `INLINE` results or `JSON_ARRAY` formatted `EXTERNAL_LINKS` only.
225+
- If only a subset of external links is returned, missing chunk metadata is fetched by index.
219226

220227
### fetchStream(statementResult, auth, options?)
221228
```ts
@@ -230,6 +237,7 @@ function fetchStream(
230237
- Throws if the result is `INLINE`.
231238
- Ends as an empty stream when no external links exist.
232239
- `forceMerge: true` forces merge even when there is only a single external link.
240+
- If only a subset of external links is returned, missing chunk metadata is fetched by index.
233241

234242
### mergeExternalLinks(statementResult, auth, options)
235243
```ts
@@ -248,8 +256,9 @@ function mergeExternalLinks(
248256
### Options (Summary)
249257
```ts
250258
type ExecuteStatementOptions = {
251-
onProgress?: (status: StatementStatus, metrics?: QueryMetrics) => void
259+
onProgress?: (result: StatementResult, metrics?: QueryMetrics) => void
252260
enableMetrics?: boolean // Fetch metrics from Query History API (default: false)
261+
logger?: Logger
253262
signal?: AbortSignal
254263
disposition?: 'INLINE' | 'EXTERNAL_LINKS'
255264
format?: 'JSON_ARRAY' | 'ARROW_STREAM' | 'CSV'
@@ -267,21 +276,25 @@ type FetchRowsOptions = {
267276
signal?: AbortSignal
268277
onEachRow?: (row: RowArray | RowObject) => void
269278
format?: 'JSON_ARRAY' | 'JSON_OBJECT'
279+
logger?: Logger
270280
}
271281
272282
type FetchAllOptions = {
273283
signal?: AbortSignal
274284
format?: 'JSON_ARRAY' | 'JSON_OBJECT'
285+
logger?: Logger
275286
}
276287
277288
type FetchStreamOptions = {
278289
signal?: AbortSignal
279290
forceMerge?: boolean
291+
logger?: Logger
280292
}
281293
282294
type MergeExternalLinksOptions = {
283295
signal?: AbortSignal
284296
forceMerge?: boolean
297+
logger?: Logger
285298
mergeStreamToExternalLink: (stream: Readable) => Promise<{
286299
externalLink: string
287300
byte_count: number

package-lock.json

Lines changed: 159 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

package.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@bitofsky/databricks-sql",
3-
"version": "1.0.1",
3+
"version": "1.0.2",
44
"description": "Databricks SQL client for Node.js - Direct REST API without SDK",
55
"main": "dist/index.cjs",
66
"module": "dist/index.js",
@@ -53,6 +53,7 @@
5353
},
5454
"devDependencies": {
5555
"@aws-sdk/client-s3": "3.958.0",
56+
"@aws-sdk/lib-storage": "3.958.0",
5657
"@aws-sdk/s3-request-presigner": "3.958.0",
5758
"@types/node": "24.0.2",
5859
"@types/stream-json": "1.7.8",

0 commit comments

Comments
 (0)