feat(core): Add parse-file tool for structured attachments (no-changelog)#28251
feat(core): Add parse-file tool for structured attachments (no-changelog)#28251
Conversation
Bundle ReportChanges will increase total bundle size by 13.25kB (0.03%) ⬆️. This is within the configured threshold ✅ Detailed changes
Affected Assets, Files, and Routes:view changes for bundle: editor-ui-esmAssets Changed:
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
3da482d to
8ce82b5
Compare
Performance ComparisonComparing current → latest master → 14-day baseline docker-stats
Memory consumption baseline with starter plan resources
Idle baseline with Instance AI module loaded
How to read this table
|
There was a problem hiding this comment.
3 issues found across 15 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/@n8n/instance-ai/src/parsers/structured-file-parser.ts">
<violation number="1" location="packages/@n8n/instance-ai/src/parsers/structured-file-parser.ts:136">
P1: Column name deduplication can produce collisions. When a natural column name (e.g. `name_1`) matches a suffix-generated name, two columns get the same normalized name, and one column's data silently overwrites the other in output rows.
Fix by re-checking `seen` until the generated name is unique.</violation>
</file>
<file name="packages/@n8n/instance-ai/src/tools/attachments/parse-file.tool.ts">
<violation number="1" location="packages/@n8n/instance-ai/src/tools/attachments/parse-file.tool.ts:34">
P3: Disallow empty delimiter strings. The schema currently allows "" but the parser throws when length !== 1, so this input passes validation then fails at runtime.</violation>
</file>
<file name="packages/cli/src/modules/instance-ai/instance-ai.service.ts">
<violation number="1" location="packages/cli/src/modules/instance-ai/instance-ai.service.ts:1652">
P2: Attachment-only parseable messages synthesize prompt text but still persist an empty `message`, causing empty thread titles and inconsistent recalled user input.</violation>
</file>
Architecture diagram
sequenceDiagram
participant Client
participant Controller as InstanceAiController
participant AIService as InstanceAiService
participant Parser as StructuredFileParser
participant Agent as Data-Table Agent
participant Tool as ParseFileTool
Note over Client,Tool: Request Initialization
Client->>Controller: POST /message { message, attachments }
Controller->>Controller: CHANGED: Validate (Allow empty message if attachments exist)
Controller->>AIService: sendMessage()
Note over AIService,Parser: Attachment Processing & Routing
AIService->>Parser: NEW: classifyAttachments()
Parser-->>AIService: List of structured (CSV/TSV/JSON) vs non-structured files
alt Has Structured Attachments
AIService->>AIService: NEW: buildAttachmentManifest() (Text-only description)
AIService->>AIService: NEW: Inject manifest into prompt + Register parse-file tool
else Has Non-Structured Attachments
AIService->>AIService: Keep raw file bytes in multimodal prompt
end
AIService->>Agent: Run Agent (DataTableAgent)
Note over Agent,Tool: Tool Execution Loop (Happy Path)
Agent->>Tool: NEW: execute(attachmentIndex, startRow, maxRows)
Tool->>Parser: NEW: parseStructuredFile(base64Data)
Parser->>Parser: Validate size (<512KB) & keys (__proto__)
Parser->>Parser: Normalize columns & Infer types
Parser->>Parser: Paginate rows (max 100/call)
Parser-->>Tool: Return paginated JSON data + nextStartRow
Tool-->>Agent: Return structured result
alt Unhappy Path: Invalid File
Tool->>Parser: parseStructuredFile()
Parser-->>Tool: Throw Error (Size/Format/Security)
Tool-->>Agent: Return { error: "reason" }
end
Agent-->>AIService: Final Summary
AIService-->>Client: Stream Response
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
…no-changelog) Add a native parse-file tool that exposes structured file attachments (CSV, TSV, JSON) through a secure, paginated API instead of injecting raw file bytes into the model prompt. - Allow attachment-only messages (empty message + attachments) - Classify attachments: structured ones get a compact manifest, non-structured keep existing multimodal file path - Parser utility with format detection, column normalization, type inference, pagination, and output budgeting - Security guardrails: 512KB size cap, 50 column max, 2000 cell budget, 40000 char budget, 5000 char cell limit, dangerous key rejection - Conditional tool registration (only when parseable attachments exist) - Data-table agent: add parse-file to tool subset, bump max steps 15 → 35, add import flow instructions to prompt - Redact structured attachment data from trace outputs - 60 new tests covering parser, tool, and registration logic
06545f2 to
9451676
Compare
… validation (no-changelog)
…rser # Conflicts: # packages/@n8n/instance-ai/src/tools/index.ts
…use, and dangerous headers (no-changelog)
Cadiac
left a comment
There was a problem hiding this comment.
left some optional comments but seems good
…log) Fix misleading test name and avoid double base64 decode in attachment classification by estimating decoded size from base64 string length.
addressed! |
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/@n8n/instance-ai/src/parsers/structured-file-parser.ts">
<violation number="1" location="packages/@n8n/instance-ai/src/parsers/structured-file-parser.ts:482">
P2: The base64 size estimate can reject valid attachments near the 512KB limit due to rounding/padding error.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
|
|
||
| // Estimate decoded size from base64 length to avoid decoding the full payload here. | ||
| // The exact decode + size check happens later in parseStructuredFile. | ||
| const estimatedDecodedSize = Math.ceil((att.data.length * 3) / 4); |
There was a problem hiding this comment.
P2: The base64 size estimate can reject valid attachments near the 512KB limit due to rounding/padding error.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/@n8n/instance-ai/src/parsers/structured-file-parser.ts, line 482:
<comment>The base64 size estimate can reject valid attachments near the 512KB limit due to rounding/padding error.</comment>
<file context>
@@ -477,27 +477,16 @@ export function classifyAttachments(attachments: AttachmentInfo[]): ClassifiedAt
- } catch {
+ // Estimate decoded size from base64 length to avoid decoding the full payload here.
+ // The exact decode + size check happens later in parseStructuredFile.
+ const estimatedDecodedSize = Math.ceil((att.data.length * 3) / 4);
+ if (estimatedDecodedSize > MAX_DECODED_SIZE_BYTES) {
return {
</file context>
| const estimatedDecodedSize = Math.ceil((att.data.length * 3) / 4); | |
| const estimatedDecodedSize = Buffer.byteLength(att.data, 'base64'); |
|
Got released with |
Summary
Add a native

parse-filetool to Instance AI that exposes structured file attachments (CSV, TSV, JSON) through a secure, paginated API instead of injecting raw file bytes into the model prompt.Key changes:
Attachment routing: Structured attachments (csv/tsv/json) are replaced with a compact manifest in the prompt text; non-structured attachments keep the existing multimodal
filepathparse-filetool: Thin wrapper over a parser utility with format detection, column normalization, type inference, pagination, and output budgetingAttachment-only messages:
messagemay now be empty whenattachmentsis non-empty — synthesizes a stub directing the agent to inspect the first parseable fileData-table agent:
parse-fileadded to tool subset, max steps increased 15 → 35, prompt updated with import flow (preview → create table → paginate + insert)Security guardrails: 512 KB decoded-size cap, 50 column max, 2000 cell budget, 40000 char budget, 5000 char cell limit, dangerous key rejection (
__proto__,constructor,prototype)Trace redaction: Raw structured attachment data is excluded from prompt-build trace outputs
I have seen this code, I have run this code, and I take responsibility for this code.