Skip to content

Commit 4f63f69

Browse files
cpsievertclaude
andcommitted
docs: clarify origin of MAX_ARROW_BATCH_ROWS constant
Document where STANDARD_VECTOR_SIZE comes from in DuckDB's C++ source and explain the failure mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 4cb2c4c commit 4f63f69

1 file changed

Lines changed: 12 additions & 4 deletions

File tree

src/reader/duckdb.rs

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -516,10 +516,18 @@ impl Reader for DuckDBReader {
516516
)));
517517
}
518518

519-
// DuckDB's Arrow virtual table function (in duckdb-rs) writes an entire
520-
// RecordBatch into a single DataChunk whose vectors have a fixed capacity
521-
// of STANDARD_VECTOR_SIZE (2048). Passing a RecordBatch with more rows
522-
// causes a panic. Work around this by chunking large DataFrames.
519+
// Workaround for a duckdb-rs limitation (not a DuckDB limitation).
520+
//
521+
// duckdb-rs's `ArrowVTab` writes each RecordBatch into a single DuckDB
522+
// `DataChunk`, which has a fixed capacity of `STANDARD_VECTOR_SIZE`.
523+
// That constant is defined in DuckDB's C++ source at
524+
// `src/include/duckdb/common/constants.hpp` and is currently 2048.
525+
// When a RecordBatch exceeds this, `FlatVector::copy` panics with
526+
// `assertion failed: data.len() <= self.capacity()`.
527+
//
528+
// We chunk large DataFrames to stay within this limit. The first chunk
529+
// creates the table (letting DuckDB infer the schema from Arrow), and
530+
// subsequent chunks INSERT into it.
523531
const MAX_ARROW_BATCH_ROWS: usize = 2048;
524532
let total_rows = df.height();
525533

0 commit comments

Comments
 (0)