Skip to content

feat(core): Add parse-file tool for structured attachments (no-changelog)#28251

Merged
aalises merged 7 commits intomasterfrom
aalises-add-file-parser
Apr 10, 2026
Merged

feat(core): Add parse-file tool for structured attachments (no-changelog)#28251
aalises merged 7 commits intomasterfrom
aalises-add-file-parser

Conversation

@aalises
Copy link
Copy Markdown
Contributor

@aalises aalises commented Apr 9, 2026

Summary

Add a native parse-file tool to Instance AI that exposes structured file attachments (CSV, TSV, JSON) through a secure, paginated API instead of injecting raw file bytes into the model prompt.
Screenshot 2026-04-10 at 13 12 47

Key changes:

  • Attachment routing: Structured attachments (csv/tsv/json) are replaced with a compact manifest in the prompt text; non-structured attachments keep the existing multimodal file path

  • parse-file tool: Thin wrapper over a parser utility with format detection, column normalization, type inference, pagination, and output budgeting

  • Attachment-only messages: message may now be empty when attachments is non-empty — synthesizes a stub directing the agent to inspect the first parseable file

  • Data-table agent: parse-file added to tool subset, max steps increased 15 → 35, prompt updated with import flow (preview → create table → paginate + insert)

  • Security guardrails: 512 KB decoded-size cap, 50 column max, 2000 cell budget, 40000 char budget, 5000 char cell limit, dangerous key rejection (__proto__, constructor, prototype)

  • Trace redaction: Raw structured attachment data is excluded from prompt-build trace outputs

  • I have seen this code, I have run this code, and I take responsibility for this code.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 9, 2026

Bundle Report

Changes will increase total bundle size by 13.25kB (0.03%) ⬆️. This is within the configured threshold ✅

Detailed changes
Bundle name Size Change
editor-ui-esm 45.58MB 13.25kB (0.03%) ⬆️

Affected Assets, Files, and Routes:

view changes for bundle: editor-ui-esm

Assets Changed:

Asset Name Size Change Total Size Change (%)
assets/typescript.worker-*.js -59 bytes 10.88MB -0.0%
assets/worker-*.js 3.14MB 3.16MB 17560.06% ⚠️
assets/worker-*.js -3.14MB 17.9kB -99.43%
assets/constants-*.js 10 bytes 3.14MB 0.0%
assets/expressions-*.js -1.01kB 857.51kB -0.12%
assets/core-*.js 2.14kB 616.17kB 0.35%
assets/SettingsSso-*.js 11.35kB 105.92kB 12.0% ⚠️
assets/SettingsSso-*.css 880 bytes 34.75kB 2.6%

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 0% with 20 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...cli/src/modules/instance-ai/instance-ai.service.ts 0.00% 18 Missing ⚠️
.../src/modules/instance-ai/instance-ai.controller.ts 0.00% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@n8n-assistant n8n-assistant Bot added core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team labels Apr 9, 2026
@aalises aalises force-pushed the aalises-add-file-parser branch from 3da482d to 8ce82b5 Compare April 9, 2026 14:11
@aalises aalises marked this pull request as ready for review April 9, 2026 14:14
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

Performance Comparison

Comparing currentlatest master14-day baseline

docker-stats

Metric Current Latest Master Baseline (avg) vs Master vs Baseline Status
docker-image-size-runners 386.00 MB 386.00 MB 387.50 MB (σ 3.00) +0.0% -0.4%
docker-image-size-n8n 1269.76 MB 1269.76 MB 1269.76 MB (σ 0.00) +0.0% +0.0%

Memory consumption baseline with starter plan resources

Metric Current Latest Master Baseline (avg) vs Master vs Baseline Status
memory-heap-used-baseline 114.56 MB 114.53 MB 113.09 MB (σ 1.15) +0.0% +1.3% ⚠️
memory-rss-baseline 282.92 MB 287.07 MB 281.78 MB (σ 34.50) -1.4% +0.4%

Idle baseline with Instance AI module loaded

Metric Current Latest Master Baseline (avg) vs Master vs Baseline Status
instance-ai-heap-used-baseline 185.95 MB 186.51 MB 186.46 MB (< 3 samples) -0.3% -0.3%
instance-ai-rss-baseline 382.41 MB 394.55 MB 369.15 MB (< 3 samples) -3.1% +3.6%
How to read this table
  • Current: This PR's value (or latest master if PR perf tests haven't run)
  • Latest Master: Most recent nightly master measurement
  • Baseline: Rolling 14-day average from master
  • vs Master: PR impact (current vs latest master)
  • vs Baseline: Drift from baseline (current vs rolling avg)
  • Status: ✅ within 1σ | ⚠️ 1-2σ | 🔴 >2σ regression

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 15 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/@n8n/instance-ai/src/parsers/structured-file-parser.ts">

<violation number="1" location="packages/@n8n/instance-ai/src/parsers/structured-file-parser.ts:136">
P1: Column name deduplication can produce collisions. When a natural column name (e.g. `name_1`) matches a suffix-generated name, two columns get the same normalized name, and one column's data silently overwrites the other in output rows.

Fix by re-checking `seen` until the generated name is unique.</violation>
</file>

<file name="packages/@n8n/instance-ai/src/tools/attachments/parse-file.tool.ts">

<violation number="1" location="packages/@n8n/instance-ai/src/tools/attachments/parse-file.tool.ts:34">
P3: Disallow empty delimiter strings. The schema currently allows "" but the parser throws when length !== 1, so this input passes validation then fails at runtime.</violation>
</file>

<file name="packages/cli/src/modules/instance-ai/instance-ai.service.ts">

<violation number="1" location="packages/cli/src/modules/instance-ai/instance-ai.service.ts:1652">
P2: Attachment-only parseable messages synthesize prompt text but still persist an empty `message`, causing empty thread titles and inconsistent recalled user input.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant Client
    participant Controller as InstanceAiController
    participant AIService as InstanceAiService
    participant Parser as StructuredFileParser
    participant Agent as Data-Table Agent
    participant Tool as ParseFileTool

    Note over Client,Tool: Request Initialization
    Client->>Controller: POST /message { message, attachments }
    Controller->>Controller: CHANGED: Validate (Allow empty message if attachments exist)
    Controller->>AIService: sendMessage()

    Note over AIService,Parser: Attachment Processing & Routing
    AIService->>Parser: NEW: classifyAttachments()
    Parser-->>AIService: List of structured (CSV/TSV/JSON) vs non-structured files

    alt Has Structured Attachments
        AIService->>AIService: NEW: buildAttachmentManifest() (Text-only description)
        AIService->>AIService: NEW: Inject manifest into prompt + Register parse-file tool
    else Has Non-Structured Attachments
        AIService->>AIService: Keep raw file bytes in multimodal prompt
    end

    AIService->>Agent: Run Agent (DataTableAgent)
    
    Note over Agent,Tool: Tool Execution Loop (Happy Path)
    Agent->>Tool: NEW: execute(attachmentIndex, startRow, maxRows)
    Tool->>Parser: NEW: parseStructuredFile(base64Data)
    
    Parser->>Parser: Validate size (<512KB) & keys (__proto__)
    Parser->>Parser: Normalize columns & Infer types
    Parser->>Parser: Paginate rows (max 100/call)
    
    Parser-->>Tool: Return paginated JSON data + nextStartRow
    Tool-->>Agent: Return structured result
    
    alt Unhappy Path: Invalid File
        Tool->>Parser: parseStructuredFile()
        Parser-->>Tool: Throw Error (Size/Format/Security)
        Tool-->>Agent: Return { error: "reason" }
    end

    Agent-->>AIService: Final Summary
    AIService-->>Client: Stream Response
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread packages/@n8n/instance-ai/src/parsers/structured-file-parser.ts
Comment thread packages/cli/src/modules/instance-ai/instance-ai.service.ts
Comment thread packages/@n8n/instance-ai/src/tools/attachments/parse-file.tool.ts Outdated
…no-changelog)

Add a native parse-file tool that exposes structured file attachments
(CSV, TSV, JSON) through a secure, paginated API instead of injecting
raw file bytes into the model prompt.

- Allow attachment-only messages (empty message + attachments)
- Classify attachments: structured ones get a compact manifest,
  non-structured keep existing multimodal file path
- Parser utility with format detection, column normalization,
  type inference, pagination, and output budgeting
- Security guardrails: 512KB size cap, 50 column max, 2000 cell
  budget, 40000 char budget, 5000 char cell limit, dangerous key
  rejection
- Conditional tool registration (only when parseable attachments exist)
- Data-table agent: add parse-file to tool subset, bump max steps
  15 → 35, add import flow instructions to prompt
- Redact structured attachment data from trace outputs
- 60 new tests covering parser, tool, and registration logic
@aalises aalises force-pushed the aalises-add-file-parser branch from 06545f2 to 9451676 Compare April 9, 2026 14:24
@aalises aalises requested review from Cadiac and r00gm April 10, 2026 11:56
Comment thread packages/@n8n/api-types/src/schemas/instance-ai.schema.ts
Comment thread packages/cli/src/modules/instance-ai/instance-ai.service.ts
Comment thread packages/@n8n/instance-ai/src/parsers/structured-file-parser.ts
Comment thread packages/@n8n/instance-ai/src/parsers/structured-file-parser.ts Outdated
Cadiac
Cadiac previously approved these changes Apr 10, 2026
Copy link
Copy Markdown
Contributor

@Cadiac Cadiac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some optional comments but seems good

…log)

Fix misleading test name and avoid double base64 decode in attachment
classification by estimating decoded size from base64 string length.
@aalises
Copy link
Copy Markdown
Contributor Author

aalises commented Apr 10, 2026

left some optional comments but seems good

addressed!

@aalises aalises requested a review from Cadiac April 10, 2026 15:35
Copy link
Copy Markdown
Contributor

@Cadiac Cadiac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/@n8n/instance-ai/src/parsers/structured-file-parser.ts">

<violation number="1" location="packages/@n8n/instance-ai/src/parsers/structured-file-parser.ts:482">
P2: The base64 size estimate can reject valid attachments near the 512KB limit due to rounding/padding error.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.


// Estimate decoded size from base64 length to avoid decoding the full payload here.
// The exact decode + size check happens later in parseStructuredFile.
const estimatedDecodedSize = Math.ceil((att.data.length * 3) / 4);
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The base64 size estimate can reject valid attachments near the 512KB limit due to rounding/padding error.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/@n8n/instance-ai/src/parsers/structured-file-parser.ts, line 482:

<comment>The base64 size estimate can reject valid attachments near the 512KB limit due to rounding/padding error.</comment>

<file context>
@@ -477,27 +477,16 @@ export function classifyAttachments(attachments: AttachmentInfo[]): ClassifiedAt
-		} catch {
+		// Estimate decoded size from base64 length to avoid decoding the full payload here.
+		// The exact decode + size check happens later in parseStructuredFile.
+		const estimatedDecodedSize = Math.ceil((att.data.length * 3) / 4);
+		if (estimatedDecodedSize > MAX_DECODED_SIZE_BYTES) {
 			return {
</file context>
Suggested change
const estimatedDecodedSize = Math.ceil((att.data.length * 3) / 4);
const estimatedDecodedSize = Buffer.byteLength(att.data, 'base64');
Fix with Cubic

@aalises aalises added this pull request to the merge queue Apr 10, 2026
Merged via the queue into master with commit ff99c84 Apr 10, 2026
84 checks passed
@aalises aalises deleted the aalises-add-file-parser branch April 10, 2026 19:26
@n8n-assistant
Copy link
Copy Markdown
Contributor

n8n-assistant Bot commented Apr 14, 2026

Got released with n8n@

Aijeyomah pushed a commit to Aijeyomah/n8n that referenced this pull request Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team Released

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants