AutoTranslation

Real-time AI subtitles for streaming and any audio playing on your Windows PC. Captures system audio, transcribes speech, translates it, and overlays subtitles in your browser.

How it works

System audio (WASAPI loopback)
        ↓
   Speech-to-text (Whisper)
        ↓
   Translation engine
        ↓
   WebSocket → Chrome extension overlay
        ↓ (optional)
   Edge TTS → local output device (speakers/headphones)

Requirements

Windows (uses WASAPI loopback audio capture)
Python 3.10+
Chrome or Edge (Chromium)

Quick start

1. Install the Python service

cd AutoTranslation
python -m venv .venv
.\.venv\Scripts\pip install -r requirements.txt

# For local Whisper (no OpenAI key needed):
.\.venv\Scripts\pip install "faster-whisper>=1.0.0" numpy

# Create your config from the example:
copy config.json.example config.json

2. Run the service

.\.venv\Scripts\python.exe run.py

You should see:

[WhisperLocal] model 'base' ready
[WS] listening on ws://127.0.0.1:8765
[AudioCapture] capturing from '...' at 48000Hz, 2ch

Important: Audio must play through your default Windows output device — the service captures loopback from that device, not the microphone.

Using VB-Audio Virtual Cable? See Audio routing with VB-Cable below.

3. Load the browser extension

Open chrome://extensions
Enable Developer mode
Click Load unpacked
Select the extension folder in this repo
Open a video page, click the extension icon, and click Apply settings

The status dot should turn green when connected to the service.

Configuration

Edit config.json (copy from config.json.example) or use the extension popup.

Speech-to-text (STT)

Engine	Description
`whisper_local`	Runs locally via faster-whisper. Free, offline after first model download (~150 MB for `base`).
`whisper_api`	OpenAI Whisper API. Requires an API key.

Translation

Engine	Description
`mymemory`	Free, no API key. Good default. ~500 requests/day.
`claude`	Anthropic API. Requires API key in `config.json`.
`openai`	OpenAI GPT. Requires API key.
`ollama`	Local Ollama instance.
`cursor`	Uses Cursor install auth (experimental).

Set target_language to any language name (e.g. Russian, Spanish, Japanese).

Voice synthesis (TTS)

Enable Voice synthesis in the popup to hear translated speech read aloud. Neural voices are auto-selected for your target language via Microsoft Edge TTS (no API key required).

Audio routing for TTS — if you use VB-Audio Cable, Chrome's audio output is routed through VB-Cable, which means TTS audio played inside the browser is also captured by the loopback, creating an infinite echo loop:

TTS plays in Chrome → Chrome output → VB-Cable → loopback capture → STT → translate → TTS → ...

The fix: set Play voice to in the extension popup to your speakers or headphones. The service then plays TTS directly via Python (bypassing VB-Cable entirely), so the loopback only captures the original source audio.

Audio routing with VB-Audio Virtual Cable

VB-Audio Virtual Cable lets you capture audio from a specific app (e.g. Chrome) rather than all system audio.

Setup:

Install VB-Cable and reboot if prompted
In Chrome settings (or Windows Volume Mixer), set Chrome's output device to CABLE Input (VB-Audio Virtual Cable)
In the extension popup → Audio routing:
- Capture from → CABLE Output (VB-Audio Virtual Cable) — selects the VB-Cable loopback as the capture source (requires service restart)
- Play voice to → your speakers or headphones — TTS plays through Python directly, not through Chrome, avoiding echo
Your regular system audio (other apps) continues to play normally on your default device

Note: After changing Capture from, restart the service (run.py) for it to take effect.
Capture from and Play voice to dropdowns are populated automatically from devices detected by the service.

Chunk boundary handling

Audio is processed in fixed-size chunks (default 4 seconds). A naive implementation cuts audio at hard time boundaries, which means words that happen to span two chunks get split mid-syllable — Whisper then transcribes garbled partial words at the end of one chunk and the start of the next.

AutoTranslation solves this with three layers working together:

1. Audio overlap

The last 0.8 seconds of each chunk is prepended to the beginning of the next chunk. This "tail buffer" ensures that any word spanning the boundary always appears complete in at least one chunk's audio.

Chunk N:  [-------- 4.0 s --------][tail]
Chunk N+1:              [tail][-------- 4.0 s --------][tail]
                         ↑
                    boundary word is now whole

The tail size is configurable (audio.overlap_seconds in config.json, default 0.8).

2. Context prompting

The last ~100 characters of the previous chunk's transcript are passed to Whisper as a prompt (Whisper API) or initial_prompt (local faster-whisper). This conditions the language model decoder to continue naturally from where the previous chunk ended, improving accuracy for context-dependent words and names.

3. Transcript deduplication

Because the overlap audio is transcribed twice (once in chunk N, once at the start of chunk N+1), the pipeline strips the repeated words before broadcasting. The deduplication uses exact word matching first, with a fuzzy SequenceMatcher fallback for cases where Whisper slightly rephrases the same audio on the second pass.

Mute window protection

When TTS voice synthesis is active, incoming audio is muted for the duration of playback to prevent echo. Whenever a chunk is dropped during the mute window, the overlap tail buffer and transcript context are both cleared — ensuring that TTS audio cannot contaminate the context for the first real speech chunk that follows.

Project structure

AutoTranslation/
├── run.py                 # Entry point
├── config.json.example    # Config template
├── requirements.txt
├── service/
│   ├── main.py            # Async service orchestration
│   ├── pipeline.py        # STT → translation → TTS pipeline
│   ├── audio_capture.py   # Windows WASAPI loopback
│   ├── websocket_server.py
│   └── engines/
│       ├── stt/           # Whisper API & local
│       ├── translation/   # MyMemory, Claude, OpenAI, Ollama, Cursor
│       └── tts/           # Edge TTS (local device playback)
└── extension/             # Chrome MV3 extension
    ├── manifest.json
    ├── background.js      # WebSocket client
    ├── content.js         # Subtitle overlay + browser-side TTS fallback
    └── popup.html/js      # Settings UI (incl. audio routing dropdowns)

Troubleshooting

Problem	Fix
Port already in use (`10048`)	Kill stale processes: `Get-Process python* \| Stop-Process -Force`
No subtitles	Ensure audio plays on your default output device; check green status dot in popup
No speech detected	Set the correct device as Windows default output; play audio through it
Settings not saving	Service must be running; popup shows "Not connected!" if WebSocket is down
Cyrillic / Unicode crash	Fixed in `run.py` via UTF-8 console reconfigure
TTS echo loop	Set Play voice to → your speakers in the popup so TTS plays via the service, not Chrome
Changed "Capture from" but still capturing old device	Requires service restart to take effect
Switched back from VB-Cable and lost system sound	Check Windows Volume Mixer — Chrome may still have VB-Cable set as its per-app output device

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
docs		docs
extension		extension
service		service
.gitignore		.gitignore
README.md		README.md
config.json.example		config.json.example
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoTranslation

How it works

Requirements

Quick start

1. Install the Python service

2. Run the service

3. Load the browser extension

Configuration

Speech-to-text (STT)

Translation

Voice synthesis (TTS)

Audio routing with VB-Audio Virtual Cable

Chunk boundary handling

1. Audio overlap

2. Context prompting

3. Transcript deduplication

Mute window protection

Project structure

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AutoTranslation

How it works

Requirements

Quick start

1. Install the Python service

2. Run the service

3. Load the browser extension

Configuration

Speech-to-text (STT)

Translation

Voice synthesis (TTS)

Audio routing with VB-Audio Virtual Cable

Chunk boundary handling

1. Audio overlap

2. Context prompting

3. Transcript deduplication

Mute window protection

Project structure

Troubleshooting

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages