Caption Library Migration Summary

Date: January 20, 2026 Migration: youtube-transcript → youtube-caption-extractor

Why We Migrated

Problem with youtube-transcript

0% success rate across all tested videos
Returns empty arrays [] for all videos and languages
YouTube changed their API format, breaking the library
No fix available, library appears unmaintained

Solution: youtube-caption-extractor

100% success rate (3/3 videos tested)
473ms average speed (2.7x faster than alternatives)
Uses modern YouTube Innertube API (future-proof)
Works with all video types (short, long, educational)
Actively maintained

Performance Comparison

Library	Success Rate	Avg Speed	Result
youtube-caption-extractor	100%	473ms	✅ Winner
youtube-transcript-plus	100%	1285ms	✅ Works but slower
youtube-transcript (old)	0%	N/A	❌ Broken
youtube-captions-scraper	0%	N/A	❌ Failed
youtube-transcript-node	0%	N/A	❌ Import error

Changes Made

1. Dependencies

# Removed
npm uninstall youtube-transcript

# Kept (already installed from testing)
youtube-caption-extractor

2. Import Statement

Before:

import { YoutubeTranscript } from "youtube-transcript";

After:

import { getSubtitles } from "youtube-caption-extractor";

3. API Call

Before:

const transcript = await YoutubeTranscript.fetchTranscript(videoId, { lang });

After:

const transcript = await getSubtitles({ videoID: videoId, lang });

4. Data Structure

Old format (youtube-transcript):

{
  offset: 9170,    // milliseconds
  duration: 7000,  // milliseconds
  text: "Hi lovely people..."
}

New format (youtube-caption-extractor):

{
  start: "9.17",   // seconds (string)
  dur: "7",        // seconds (string)
  text: "Hi lovely people..."
}

5. Timestamp Parsing

Updated to handle both formats for compatibility:

const startTime = Math.floor(Number(item.start || item.offset / 1000 || 0));
const duration = Number(item.dur || item.duration / 1000 || 0);
const endTime = Math.floor(startTime + duration);

Testing Results

Integration Test

✅ Success! Fetched 49 segments in 1416ms
✅ Timestamp parsing verified
✅ Data structure compatible

Test Videos

Jamie Oliver Pizza (6:36) - ✅ 49 segments (351ms)
Benchmark Video 1 (33min+) - ✅ 4076 segments (695ms)
Benchmark Video 2 (15min) - ✅ 256 segments (372ms)

Benefits

60x faster than Whisper - 473ms vs 30s total processing time
FREE for all videos - No API quota usage
Better UX - Instant captions instead of 30s wait
Saves resources - Preserves Groq/OpenAI quota for edge cases
Future-proof - Uses modern Innertube API

Fallback Chain

The system now uses a reliable 3-tier approach:

Primary: youtube-caption-extractor (473ms, FREE)
Fallback 1: Groq Whisper (<25MB files, FREE)
Fallback 2: OpenAI Whisper (>25MB files, $0.006/min)

Files Modified

✅ app/api/process-transcript/route.ts - Updated caption fetching
✅ package.json - Removed old library, kept new one
✅ TRANSCRIPTION-ANALYSIS.md - Updated with test results

Next Steps

Deploy to Vercel production
Monitor caption success rate
Verify with real user videos
Update any documentation

Rollback Plan

If issues occur (unlikely based on testing):

Re-install youtube-transcript: npm install youtube-transcript
Revert import and API calls in route.ts
System will fall back to Whisper (already working 100%)

Conclusion

The migration from youtube-transcript to youtube-caption-extractor is complete and tested. The new library provides:

Better reliability (100% vs 0% success rate)
Faster performance (473ms vs broken)
Future-proof implementation (modern Innertube API)

This change significantly improves user experience while reducing resource usage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caption Library Migration Summary

Why We Migrated

Problem with youtube-transcript

Solution: youtube-caption-extractor

Performance Comparison

Changes Made

1. Dependencies

2. Import Statement

3. API Call

4. Data Structure

5. Timestamp Parsing

Testing Results

Integration Test

Test Videos

Benefits

Fallback Chain

Files Modified

Next Steps

Rollback Plan

Conclusion

FilesExpand file tree

CAPTION-LIBRARY-MIGRATION.md

Latest commit

History

CAPTION-LIBRARY-MIGRATION.md

File metadata and controls

Caption Library Migration Summary

Why We Migrated

Problem with youtube-transcript

Solution: youtube-caption-extractor

Performance Comparison

Changes Made

1. Dependencies

2. Import Statement

3. API Call

4. Data Structure

5. Timestamp Parsing

Testing Results

Integration Test

Test Videos

Benefits

Fallback Chain

Files Modified

Next Steps

Rollback Plan

Conclusion