Skip to content

Commit 38f8a45

Browse files
danny-avilaPreciousOritsedere
authored andcommitted
🆔 fix: Atomic File Dedupe, Bedrock Tokens Fix, and Allowed MIME Types (danny-avila#11675)
* feat: Add support for Apache Parquet MIME types - Introduced 'application/x-parquet' to the full MIME types list and code interpreter MIME types list. - Updated application MIME types regex to include 'x-parquet' and 'vnd.apache.parquet'. - Added mapping for '.parquet' files to 'application/x-parquet' in code type mapping, enhancing file format support. * feat: Implement atomic file claiming for code execution outputs - Added a new `claimCodeFile` function to atomically claim a file_id for code execution outputs, preventing duplicates by using a compound key of filename and conversationId. - Updated `processCodeOutput` to utilize the new claiming mechanism, ensuring that concurrent calls for the same filename converge on a single record. - Refactored related tests to validate the new atomic claiming behavior and its impact on file usage tracking and versioning. * fix: Update image file handling to use cache-busting filepath - Modified the `processCodeOutput` function to generate a cache-busting filepath for updated image files, improving browser caching behavior. - Adjusted related tests to reflect the change from versioned filenames to cache-busted filepaths, ensuring accurate validation of image updates. * fix: Update step handler to prevent undefined content for non-tool call types - Modified the condition in useStepHandler to ensure that undefined content is only assigned for specific content types, enhancing the robustness of content handling. * fix: Update bedrockOutputParser to handle maxTokens for adaptive models - Modified the bedrockOutputParser logic to ensure that maxTokens is not set for adaptive models when neither maxTokens nor maxOutputTokens are provided, improving the handling of adaptive thinking configurations. - Updated related tests to reflect these changes, ensuring accurate validation of the output for adaptive models. * chore: Update @librechat/agents to version 3.1.38 in package.json and package-lock.json * fix: Enhance file claiming and error handling in code processing - Updated the `processCodeOutput` function to use a consistent file ID for claiming files, preventing duplicates and improving concurrency handling. - Refactored the `createFileMethods` to include error handling for failed file claims, ensuring robust behavior when claiming files for conversations. - These changes enhance the reliability of file management in the application. * fix: Update adaptive thinking test for Opus 4.6 model - Modified the test for configuring adaptive thinking to reflect that no default maxTokens should be set for the Opus 4.6 model. - Updated assertions to ensure that maxTokens is undefined, aligning with the expected behavior for adaptive models.
1 parent ae07bea commit 38f8a45

10 files changed

Lines changed: 87 additions & 123 deletions

File tree

‎api/package.json‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@
4646
"@inrupt/solid-client": "^1.30.2",
4747
"@keyv/redis": "^4.3.3",
4848
"@langchain/core": "^0.3.80",
49-
"@librechat/agents": "^3.1.37",
49+
"@librechat/agents": "^3.1.38",
5050
"@librechat/api": "*",
5151
"@librechat/data-schemas": "*",
5252
"@microsoft/microsoft-graph-client": "^3.0.7",

‎api/server/services/Files/Code/process.js‎

Lines changed: 21 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -56,50 +56,6 @@ const createDownloadFallback = ({
5656
};
5757
};
5858

59-
/**
60-
* Find an existing code-generated file by filename in the conversation.
61-
* Used to update existing files instead of creating duplicates.
62-
*
63-
* ## Deduplication Strategy
64-
*
65-
* Files are deduplicated by `(conversationId, filename)` - NOT including `messageId`.
66-
* This is an intentional design decision to handle iterative code development patterns:
67-
*
68-
* **Rationale:**
69-
* - When users iteratively refine code (e.g., "regenerate that chart with red bars"),
70-
* the same logical file (e.g., "chart.png") is produced multiple times
71-
* - Without deduplication, each iteration would create a new file, leading to storage bloat
72-
* - The latest version is what matters for re-upload to the code environment
73-
*
74-
* **Implications:**
75-
* - Different messages producing files with the same name will update the same file record
76-
* - The `messageId` field tracks which message last updated the file
77-
* - The `usage` counter tracks how many times the file has been generated
78-
*
79-
* **Future Considerations:**
80-
* - If file versioning is needed, consider adding a `versions` array or separate version collection
81-
* - The current approach prioritizes storage efficiency over history preservation
82-
*
83-
* @param {string} filename - The filename to search for.
84-
* @param {string} conversationId - The conversation ID.
85-
* @returns {Promise<MongoFile | null>} The existing file or null.
86-
*/
87-
const findExistingCodeFile = async (filename, conversationId) => {
88-
if (!filename || !conversationId) {
89-
return null;
90-
}
91-
const files = await getFiles(
92-
{
93-
filename,
94-
conversationId,
95-
context: FileContext.execute_code,
96-
},
97-
{ createdAt: -1 },
98-
{ text: 0 },
99-
);
100-
return files?.[0] ?? null;
101-
};
102-
10359
/**
10460
* Process code execution output files - downloads and saves both images and non-image files.
10561
* All files are saved to local storage with fileIdentifier metadata for code env re-upload.
@@ -170,12 +126,19 @@ const processCodeOutput = async ({
170126
const fileIdentifier = `${session_id}/${id}`;
171127

172128
/**
173-
* Check for existing file with same filename in this conversation.
174-
* If found, we'll update it instead of creating a duplicate.
129+
* Atomically claim a file_id for this (filename, conversationId, context) tuple.
130+
* Uses $setOnInsert so concurrent calls for the same filename converge on
131+
* a single record instead of creating duplicates (TOCTOU race fix).
175132
*/
176-
const existingFile = await findExistingCodeFile(name, conversationId);
177-
const file_id = existingFile?.file_id ?? v4();
178-
const isUpdate = !!existingFile;
133+
const newFileId = v4();
134+
const claimed = await claimCodeFile({
135+
filename: name,
136+
conversationId,
137+
file_id: newFileId,
138+
user: req.user.id,
139+
});
140+
const file_id = claimed.file_id;
141+
const isUpdate = file_id !== newFileId;
179142

180143
if (isUpdate) {
181144
logger.debug(
@@ -184,27 +147,29 @@ const processCodeOutput = async ({
184147
}
185148

186149
if (isImage) {
150+
const usage = isUpdate ? (claimed.usage ?? 0) + 1 : 1;
187151
const _file = await convertImage(req, buffer, 'high', `${file_id}${fileExt}`);
152+
const filepath = usage > 1 ? `${_file.filepath}?v=${Date.now()}` : _file.filepath;
188153
const file = {
189154
..._file,
155+
filepath,
190156
file_id,
191157
messageId,
192-
usage: isUpdate ? (existingFile.usage ?? 0) + 1 : 1,
158+
usage,
193159
filename: name,
194160
conversationId,
195161
user: req.user.id,
196162
type: `image/${appConfig.imageOutputType}`,
197-
createdAt: isUpdate ? existingFile.createdAt : formattedDate,
163+
createdAt: isUpdate ? claimed.createdAt : formattedDate,
198164
updatedAt: formattedDate,
199165
source: appConfig.fileStrategy,
200166
context: FileContext.execute_code,
201167
metadata: { fileIdentifier },
202168
};
203-
createFile(file, true);
169+
await createFile(file, true);
204170
return Object.assign(file, { messageId, toolCallId });
205171
}
206172

207-
// For non-image files, save to configured storage strategy
208173
const { saveBuffer } = getStrategyFunctions(appConfig.fileStrategy);
209174
if (!saveBuffer) {
210175
logger.warn(
@@ -221,7 +186,6 @@ const processCodeOutput = async ({
221186
});
222187
}
223188

224-
// Determine MIME type from buffer or extension
225189
const detectedType = await determineFileType(buffer, true);
226190
const mimeType = detectedType?.mime || inferMimeType(name, '') || 'application/octet-stream';
227191

@@ -259,11 +223,11 @@ const processCodeOutput = async ({
259223
metadata: { fileIdentifier },
260224
source: appConfig.fileStrategy,
261225
context: FileContext.execute_code,
262-
usage: isUpdate ? (existingFile.usage ?? 0) + 1 : 1,
263-
createdAt: isUpdate ? existingFile.createdAt : formattedDate,
226+
usage: isUpdate ? (claimed.usage ?? 0) + 1 : 1,
227+
createdAt: isUpdate ? claimed.createdAt : formattedDate,
264228
};
265229

266-
createFile(file, true);
230+
await createFile(file, true);
267231
return Object.assign(file, { messageId, toolCallId });
268232
} catch (error) {
269233
logAxiosError({

‎api/server/services/Files/Code/process.spec.js‎

Lines changed: 39 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,11 @@ describe('Code Process', () => {
121121

122122
beforeEach(() => {
123123
jest.clearAllMocks();
124-
// Default mock implementations
124+
// Default mock: atomic claim returns a new file record (no existing file)
125+
mockClaimCodeFile.mockResolvedValue({
126+
file_id: 'mock-uuid-1234',
127+
user: 'user-123',
128+
});
125129
getFiles.mockResolvedValue(null);
126130
createFile.mockResolvedValue({});
127131
getStrategyFunctions.mockReturnValue({
@@ -130,67 +134,46 @@ describe('Code Process', () => {
130134
determineFileType.mockResolvedValue({ mime: 'text/plain' });
131135
});
132136

133-
describe('findExistingCodeFile (via processCodeOutput)', () => {
134-
it('should find existing file by filename and conversationId', async () => {
135-
const existingFile = {
137+
describe('atomic file claim (via processCodeOutput)', () => {
138+
it('should reuse file_id from existing record via atomic claim', async () => {
139+
mockClaimCodeFile.mockResolvedValue({
136140
file_id: 'existing-file-id',
137141
filename: 'test-file.txt',
138142
usage: 2,
139143
createdAt: '2024-01-01T00:00:00.000Z',
140-
};
141-
getFiles.mockResolvedValue([existingFile]);
144+
});
142145

143146
const smallBuffer = Buffer.alloc(100);
144147
axios.mockResolvedValue({ data: smallBuffer });
145148

146149
const result = await processCodeOutput(baseParams);
147150

148-
// Verify getFiles was called with correct deduplication query
149-
expect(getFiles).toHaveBeenCalledWith(
150-
{
151-
filename: 'test-file.txt',
152-
conversationId: 'conv-123',
153-
context: FileContext.execute_code,
154-
},
155-
{ createdAt: -1 },
156-
{ text: 0 },
157-
);
151+
expect(mockClaimCodeFile).toHaveBeenCalledWith({
152+
filename: 'test-file.txt',
153+
conversationId: 'conv-123',
154+
file_id: 'mock-uuid-1234',
155+
user: 'user-123',
156+
});
158157

159-
// Verify the existing file_id was reused
160158
expect(result.file_id).toBe('existing-file-id');
161-
// Verify usage was incremented
162159
expect(result.usage).toBe(3);
163-
// Verify original createdAt was preserved
164160
expect(result.createdAt).toBe('2024-01-01T00:00:00.000Z');
165161
});
166162

167163
it('should create new file when no existing file found', async () => {
168-
getFiles.mockResolvedValue(null);
164+
mockClaimCodeFile.mockResolvedValue({
165+
file_id: 'mock-uuid-1234',
166+
user: 'user-123',
167+
});
169168

170169
const smallBuffer = Buffer.alloc(100);
171170
axios.mockResolvedValue({ data: smallBuffer });
172171

173172
const result = await processCodeOutput(baseParams);
174173

175-
// Should use the mocked uuid
176174
expect(result.file_id).toBe('mock-uuid-1234');
177-
// Should have usage of 1 for new file
178175
expect(result.usage).toBe(1);
179176
});
180-
181-
it('should return null for invalid inputs (empty filename)', async () => {
182-
const smallBuffer = Buffer.alloc(100);
183-
axios.mockResolvedValue({ data: smallBuffer });
184-
185-
// The function handles this internally - with empty name
186-
// findExistingCodeFile returns null early for empty filename (guard clause)
187-
const result = await processCodeOutput({ ...baseParams, name: '' });
188-
189-
// getFiles should NOT be called due to early return in findExistingCodeFile
190-
expect(getFiles).not.toHaveBeenCalled();
191-
// A new file_id should be generated since no existing file was found
192-
expect(result.file_id).toBe('mock-uuid-1234');
193-
});
194177
});
195178

196179
describe('processCodeOutput', () => {
@@ -205,7 +188,6 @@ describe('Code Process', () => {
205188
bytes: 400,
206189
};
207190
convertImage.mockResolvedValue(convertedFile);
208-
getFiles.mockResolvedValue(null);
209191

210192
const result = await processCodeOutput(imageParams);
211193

@@ -220,23 +202,29 @@ describe('Code Process', () => {
220202
expect(result.filename).toBe('chart.png');
221203
});
222204

223-
it('should update existing image file and increment usage', async () => {
205+
it('should update existing image file with cache-busted filepath', async () => {
224206
const imageParams = { ...baseParams, name: 'chart.png' };
225-
const existingFile = {
207+
mockClaimCodeFile.mockResolvedValue({
226208
file_id: 'existing-img-id',
227209
usage: 1,
228210
createdAt: '2024-01-01T00:00:00.000Z',
229-
};
230-
getFiles.mockResolvedValue([existingFile]);
211+
});
231212

232213
const imageBuffer = Buffer.alloc(500);
233214
axios.mockResolvedValue({ data: imageBuffer });
234-
convertImage.mockResolvedValue({ filepath: '/uploads/img.webp' });
215+
convertImage.mockResolvedValue({ filepath: '/images/user-123/existing-img-id.webp' });
235216

236217
const result = await processCodeOutput(imageParams);
237218

219+
expect(convertImage).toHaveBeenCalledWith(
220+
mockReq,
221+
imageBuffer,
222+
'high',
223+
'existing-img-id.png',
224+
);
238225
expect(result.file_id).toBe('existing-img-id');
239226
expect(result.usage).toBe(2);
227+
expect(result.filepath).toMatch(/^\/images\/user-123\/existing-img-id\.webp\?v=\d+$/);
240228
expect(logger.debug).toHaveBeenCalledWith(
241229
expect.stringContaining('Updating existing file'),
242230
);
@@ -337,7 +325,6 @@ describe('Code Process', () => {
337325

338326
describe('usage counter increment', () => {
339327
it('should set usage to 1 for new files', async () => {
340-
getFiles.mockResolvedValue(null);
341328
const smallBuffer = Buffer.alloc(100);
342329
axios.mockResolvedValue({ data: smallBuffer });
343330

@@ -347,8 +334,11 @@ describe('Code Process', () => {
347334
});
348335

349336
it('should increment usage for existing files', async () => {
350-
const existingFile = { file_id: 'existing-id', usage: 5, createdAt: '2024-01-01' };
351-
getFiles.mockResolvedValue([existingFile]);
337+
mockClaimCodeFile.mockResolvedValue({
338+
file_id: 'existing-id',
339+
usage: 5,
340+
createdAt: '2024-01-01',
341+
});
352342
const smallBuffer = Buffer.alloc(100);
353343
axios.mockResolvedValue({ data: smallBuffer });
354344

@@ -358,14 +348,15 @@ describe('Code Process', () => {
358348
});
359349

360350
it('should handle existing file with undefined usage', async () => {
361-
const existingFile = { file_id: 'existing-id', createdAt: '2024-01-01' };
362-
getFiles.mockResolvedValue([existingFile]);
351+
mockClaimCodeFile.mockResolvedValue({
352+
file_id: 'existing-id',
353+
createdAt: '2024-01-01',
354+
});
363355
const smallBuffer = Buffer.alloc(100);
364356
axios.mockResolvedValue({ data: smallBuffer });
365357

366358
const result = await processCodeOutput(baseParams);
367359

368-
// (undefined ?? 0) + 1 = 1
369360
expect(result.usage).toBe(1);
370361
});
371362
});

‎package-lock.json‎

Lines changed: 5 additions & 5 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

‎packages/api/package.json‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@
8787
"@google/genai": "^1.19.0",
8888
"@keyv/redis": "^4.3.3",
8989
"@langchain/core": "^0.3.80",
90-
"@librechat/agents": "^3.1.37",
90+
"@librechat/agents": "^3.1.38",
9191
"@librechat/data-schemas": "*",
9292
"@modelcontextprotocol/sdk": "^1.26.0",
9393
"@smithy/node-http-handler": "^4.4.5",

‎packages/api/src/endpoints/bedrock/initialize.spec.ts‎

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -615,7 +615,7 @@ describe('initializeBedrock', () => {
615615
});
616616

617617
describe('Opus 4.6 Adaptive Thinking', () => {
618-
it('should configure adaptive thinking with default maxTokens for Opus 4.6', async () => {
618+
it('should configure adaptive thinking with no default maxTokens for Opus 4.6', async () => {
619619
const params = createMockParams({
620620
model_parameters: {
621621
model: 'anthropic.claude-opus-4-6-v1',
@@ -626,7 +626,7 @@ describe('initializeBedrock', () => {
626626
const amrf = result.llmConfig.additionalModelRequestFields as Record<string, unknown>;
627627

628628
expect(amrf.thinking).toEqual({ type: 'adaptive' });
629-
expect(result.llmConfig.maxTokens).toBe(16000);
629+
expect(result.llmConfig.maxTokens).toBeUndefined();
630630
expect(amrf.anthropic_beta).toEqual(
631631
expect.arrayContaining(['output-128k-2025-02-19', 'context-1m-2025-08-07']),
632632
);

0 commit comments

Comments
 (0)