feat(core): Add parse-file tool for structured attachments (no-changelog) by aalises · Pull Request #28251 · n8n-io/n8n

aalises · 2026-04-09T13:16:04Z

Summary

Add a native parse-file tool to Instance AI that exposes structured file attachments (CSV, TSV, JSON) through a secure, paginated API instead of injecting raw file bytes into the model prompt.

Key changes:

Attachment routing: Structured attachments (csv/tsv/json) are replaced with a compact manifest in the prompt text; non-structured attachments keep the existing multimodal file path
parse-file tool: Thin wrapper over a parser utility with format detection, column normalization, type inference, pagination, and output budgeting
Attachment-only messages: message may now be empty when attachments is non-empty — synthesizes a stub directing the agent to inspect the first parseable file
Data-table agent: parse-file added to tool subset, max steps increased 15 → 35, prompt updated with import flow (preview → create table → paginate + insert)
Security guardrails: 512 KB decoded-size cap, 50 column max, 2000 cell budget, 40000 char budget, 5000 char cell limit, dangerous key rejection (__proto__, constructor, prototype)
Trace redaction: Raw structured attachment data is excluded from prompt-build trace outputs
I have seen this code, I have run this code, and I take responsibility for this code.

codecov · 2026-04-09T13:18:16Z

Bundle Report

Changes will increase total bundle size by 13.25kB (0.03%) ⬆️. This is within the configured threshold ✅

Detailed changes

Bundle name	Size	Change
editor-ui-esm	45.58MB	13.25kB (0.03%) ⬆️

Affected Assets, Files, and Routes:

view changes for bundle: editor-ui-esm

Assets Changed:

Asset Name	Size Change	Total Size	Change (%)
`assets/typescript.worker-*.js`	-59 bytes	10.88MB	-0.0%
`assets/worker-*.js`	3.14MB	3.16MB	17560.06% ⚠️
`assets/worker-*.js`	-3.14MB	17.9kB	-99.43%
`assets/constants-*.js`	10 bytes	3.14MB	0.0%
`assets/expressions-*.js`	-1.01kB	857.51kB	-0.12%
`assets/core-*.js`	2.14kB	616.17kB	0.35%
`assets/SettingsSso-*.js`	11.35kB	105.92kB	12.0% ⚠️
`assets/SettingsSso-*.css`	880 bytes	34.75kB	2.6%

codecov · 2026-04-09T13:21:07Z

Codecov Report

❌ Patch coverage is 0% with 20 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...cli/src/modules/instance-ai/instance-ai.service.ts	0.00%	18 Missing ⚠️
.../src/modules/instance-ai/instance-ai.controller.ts	0.00%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

github-actions · 2026-04-09T14:14:28Z

Performance Comparison

Comparing current → latest master → 14-day baseline

docker-stats

Metric	Current	Latest Master	Baseline (avg)	vs Master	vs Baseline	Status
docker-image-size-runners	386.00 MB	386.00 MB	387.50 MB (σ 3.00)	+0.0%	-0.4%	✅
docker-image-size-n8n	1269.76 MB	1269.76 MB	1269.76 MB (σ 0.00)	+0.0%	+0.0%	—

Memory consumption baseline with starter plan resources

Metric	Current	Latest Master	Baseline (avg)	vs Master	vs Baseline	Status
memory-heap-used-baseline	114.56 MB	114.53 MB	113.09 MB (σ 1.15)	+0.0%	+1.3%	⚠️
memory-rss-baseline	282.92 MB	287.07 MB	281.78 MB (σ 34.50)	-1.4%	+0.4%	✅

Idle baseline with Instance AI module loaded

Metric	Current	Latest Master	Baseline (avg)	vs Master	vs Baseline	Status
instance-ai-heap-used-baseline	185.95 MB	186.51 MB	186.46 MB (< 3 samples)	-0.3%	-0.3%	—
instance-ai-rss-baseline	382.41 MB	394.55 MB	369.15 MB (< 3 samples)	-3.1%	+3.6%	—

How to read this table

Current: This PR's value (or latest master if PR perf tests haven't run)
Latest Master: Most recent nightly master measurement
Baseline: Rolling 14-day average from master
vs Master: PR impact (current vs latest master)
vs Baseline: Drift from baseline (current vs rolling avg)
Status: ✅ within 1σ | ⚠️ 1-2σ | 🔴 >2σ regression

cubic-dev-ai

3 issues found across 15 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/@n8n/instance-ai/src/parsers/structured-file-parser.ts">

<violation number="1" location="packages/@n8n/instance-ai/src/parsers/structured-file-parser.ts:136">
P1: Column name deduplication can produce collisions. When a natural column name (e.g. `name_1`) matches a suffix-generated name, two columns get the same normalized name, and one column's data silently overwrites the other in output rows.

Fix by re-checking `seen` until the generated name is unique.</violation>
</file>

<file name="packages/@n8n/instance-ai/src/tools/attachments/parse-file.tool.ts">

<violation number="1" location="packages/@n8n/instance-ai/src/tools/attachments/parse-file.tool.ts:34">
P3: Disallow empty delimiter strings. The schema currently allows "" but the parser throws when length !== 1, so this input passes validation then fails at runtime.</violation>
</file>

<file name="packages/cli/src/modules/instance-ai/instance-ai.service.ts">

<violation number="1" location="packages/cli/src/modules/instance-ai/instance-ai.service.ts:1652">
P2: Attachment-only parseable messages synthesize prompt text but still persist an empty `message`, causing empty thread titles and inconsistent recalled user input.</violation>
</file>

Architecture diagram

sequenceDiagram
    participant Client
    participant Controller as InstanceAiController
    participant AIService as InstanceAiService
    participant Parser as StructuredFileParser
    participant Agent as Data-Table Agent
    participant Tool as ParseFileTool

    Note over Client,Tool: Request Initialization
    Client->>Controller: POST /message { message, attachments }
    Controller->>Controller: CHANGED: Validate (Allow empty message if attachments exist)
    Controller->>AIService: sendMessage()

    Note over AIService,Parser: Attachment Processing & Routing
    AIService->>Parser: NEW: classifyAttachments()
    Parser-->>AIService: List of structured (CSV/TSV/JSON) vs non-structured files

    alt Has Structured Attachments
        AIService->>AIService: NEW: buildAttachmentManifest() (Text-only description)
        AIService->>AIService: NEW: Inject manifest into prompt + Register parse-file tool
    else Has Non-Structured Attachments
        AIService->>AIService: Keep raw file bytes in multimodal prompt
    end

    AIService->>Agent: Run Agent (DataTableAgent)
    
    Note over Agent,Tool: Tool Execution Loop (Happy Path)
    Agent->>Tool: NEW: execute(attachmentIndex, startRow, maxRows)
    Tool->>Parser: NEW: parseStructuredFile(base64Data)
    
    Parser->>Parser: Validate size (<512KB) & keys (__proto__)
    Parser->>Parser: Normalize columns & Infer types
    Parser->>Parser: Paginate rows (max 100/call)
    
    Parser-->>Tool: Return paginated JSON data + nextStartRow
    Tool-->>Agent: Return structured result
    
    alt Unhappy Path: Invalid File
        Tool->>Parser: parseStructuredFile()
        Parser-->>Tool: Throw Error (Size/Format/Security)
        Tool-->>Agent: Return { error: "reason" }
    end

    Agent-->>AIService: Final Summary
    AIService-->>Client: Stream Response

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.}

…no-changelog) Add a native parse-file tool that exposes structured file attachments (CSV, TSV, JSON) through a secure, paginated API instead of injecting raw file bytes into the model prompt. - Allow attachment-only messages (empty message + attachments) - Classify attachments: structured ones get a compact manifest, non-structured keep existing multimodal file path - Parser utility with format detection, column normalization, type inference, pagination, and output budgeting - Security guardrails: 512KB size cap, 50 column max, 2000 cell budget, 40000 char budget, 5000 char cell limit, dangerous key rejection - Conditional tool registration (only when parseable attachments exist) - Data-table agent: add parse-file to tool subset, bump max steps 15 → 35, add import flow instructions to prompt - Redact structured attachment data from trace outputs - 60 new tests covering parser, tool, and registration logic

… validation (no-changelog)

…rser # Conflicts: # packages/@n8n/instance-ai/src/tools/index.ts

…use, and dangerous headers (no-changelog)

Cadiac

left some optional comments but seems good

…log) Fix misleading test name and avoid double base64 decode in attachment classification by estimating decoded size from base64 string length.

aalises · 2026-04-10T15:35:43Z

left some optional comments but seems good

addressed!

Cadiac

👍

cubic-dev-ai

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/@n8n/instance-ai/src/parsers/structured-file-parser.ts">

<violation number="1" location="packages/@n8n/instance-ai/src/parsers/structured-file-parser.ts:482">
P2: The base64 size estimate can reject valid attachments near the 512KB limit due to rounding/padding error.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.}

cubic-dev-ai · 2026-04-10T15:36:39Z

+
+		// Estimate decoded size from base64 length to avoid decoding the full payload here.
+		// The exact decode + size check happens later in parseStructuredFile.
+		const estimatedDecodedSize = Math.ceil((att.data.length * 3) / 4);


P2: The base64 size estimate can reject valid attachments near the 512KB limit due to rounding/padding error.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/@n8n/instance-ai/src/parsers/structured-file-parser.ts, line 482: <comment>The base64 size estimate can reject valid attachments near the 512KB limit due to rounding/padding error.</comment> <file context> @@ -477,27 +477,16 @@ export function classifyAttachments(attachments: AttachmentInfo[]): ClassifiedAt - } catch { + // Estimate decoded size from base64 length to avoid decoding the full payload here. + // The exact decode + size check happens later in parseStructuredFile. + const estimatedDecodedSize = Math.ceil((att.data.length * 3) / 4); + if (estimatedDecodedSize > MAX_DECODED_SIZE_BYTES) { return { </file context>

Suggested change

const estimatedDecodedSize = Math.ceil((att.data.length * 3) / 4);

const estimatedDecodedSize = Buffer.byteLength(att.data, 'base64');

n8n-assistant · 2026-04-14T10:27:24Z

Got released with n8n@

…log) (n8n-io#28251)

n8n-assistant Bot added core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team labels Apr 9, 2026

aalises force-pushed the aalises-add-file-parser branch from 3da482d to 8ce82b5 Compare April 9, 2026 14:11

aalises marked this pull request as ready for review April 9, 2026 14:14

cubic-dev-ai Bot reviewed Apr 9, 2026

View reviewed changes

Comment thread packages/@n8n/instance-ai/src/parsers/structured-file-parser.ts

Comment thread packages/cli/src/modules/instance-ai/instance-ai.service.ts

Comment thread packages/@n8n/instance-ai/src/tools/attachments/parse-file.tool.ts Outdated

aalises force-pushed the aalises-add-file-parser branch from 06545f2 to 9451676 Compare April 9, 2026 14:24

aalises added 5 commits April 9, 2026 16:25

fix(instance-ai): Fix column name dedup collision and empty delimiter…

ea6c61c

… validation (no-changelog)

Merge remote-tracking branch 'origin/master' into aalises-add-file-pa…

47bf82b

…rser # Conflicts: # packages/@n8n/instance-ai/src/tools/index.ts

feat(i18n): Add parse-file tool label for instance AI (no-changelog)

be4aeb2

fix(instance-ai): Harden parse-file against prompt injection, size ab…

b4d1b62

…use, and dangerous headers (no-changelog)

Merge branch 'master' into aalises-add-file-parser

c789069

aalises requested review from Cadiac and r00gm April 10, 2026 11:56