Local-AI-transcription

I wanted to build a customizable CLI for Whisper.cpp that uses a Local LLM to do post-processing on the transcripts with modes/personal names etc. So here it is.

Setup

Make sure you have installed uv and git and ffmpeg on your machine
Install opencode with npm install -g opencode (or ollama with brew install ollama)
If you are on linux - install sudo apt install arecord. If you are on mac brew install sox
Clone this repo
I have ONLY tested this out on a Mac with an M chip. I was not trying to make it a universal tool yet
Once you have all this, run ./setup.sh in this directory
- Use AI_BACKEND_TYPE=opencode ./setup.sh for opencode (default)
- Use AI_BACKEND_TYPE=ollama ./setup.sh for ollama
If all is well, run ./run.sh and you should be good to go

AI Service Configuration

By default, this project uses opencode for AI processing. You can configure the AI service during setup or by setting environment variables:

During setup:

AI_BACKEND_TYPE=opencode ./setup.sh    # Use opencode (default)
AI_BACKEND_TYPE=ollama ./setup.sh       # Use ollama

Environment variables:

AI_SERVICE: Choose between opencode (default) or ollama
AI_MODEL: The model to use (default: zai-coding-plan/glm-4.7 for opencode, or deepseek-r1:latest for ollama)

For opencode (default):

AI_SERVICE=opencode
AI_MODEL=zai-coding-plan/glm-4.7

For ollama (optional):

AI_SERVICE=ollama
AI_BASE_URL=http://localhost:11434/v1
AI_MODEL=deepseek-r1:latest

Note: If using ollama, install the optional dependency with pip install ollama or uv sync --extra ollama

Usage

CLI Usage

The project includes a CLI for scripting and batch operations. Use python cli.py --help for full documentation.

Basic Commands

Record audio from your microphone:

python cli.py record
python cli.py record grammar              # Record and run AI grammar check
python cli.py record -d 10                # Record for 10 seconds
python cli.py record --out /path/to/file.wav

Transcribe an existing audio/video file:

python cli.py transcribe /path/to/file.mp3
python cli.py transcribe /path/to/file.mp3 grammar      # Transcribe and run AI mode
python cli.py transcribe /path/to/file.mp3 --out result.txt
python cli.py transcribe /path/to/file.mp3 --names "john,mary"

Manage transcription jobs:

python cli.py jobs                        # List all jobs
python cli.py jobs --limit 50             # List last 50 jobs
python cli.py jobs --id <job-uuid>        # Show full details of a job
python cli.py jobs --delete <job-uuid>    # Delete a specific job
python cli.py jobs --clear                # Delete all completed/errored jobs

View and run AI modes:

python cli.py modes                       # List all available AI modes
python cli.py ai <job-uuid> grammar       # Run AI grammar check on existing job
python cli.py ai <job-uuid> summarize     # Run AI summarization on existing job
python cli.py ai <job-uuid> grammar --names "john,mary"

Personal names:

python cli.py names list                  # Show configured personal names
python cli.py names add "Alice,Bob"     # Add new personal names (comma-separated)

Options

--verbose or -v: Enable debug logging (shows SQL queries, detailed progress)
--db: Use SQLAlchemy job store instead of JSON (requires DATABASE_URL env var)

Examples

Record and auto-fix transcript in one command:

python cli.py record grammar

Transcribe batch of files:

for file in recordings/*.mp3; do
  echo "Processing $file..."
  python cli.py transcribe "$file" grammar
done

Transcribe in verbose mode (see detailed logs):

python cli.py -v transcribe /path/to/file.mp3

Features

Records from microphone (fixed duration or manual stop)
Transcribes audio/video files (mp3, wav, flac, m4a, mp4, mkv, mov)
Stores jobs with progress tracking and history
AI post-processing with configurable modes (grammar fix, summarization, custom)
Supports custom names/words for better recognition
Progress bars and real-time updates
SQL or JSON job storage

Customizing Prompts

The prompts.toml file controls the set of AI modes. Each mode corresponds to a top-level section under [prompts]. The CLI will use these modes.

Example Modes

Summarize Prompt:

[prompts.summarize]
instruction = "Your instructions here."
formatting_rules = [
    "Rule 1",
    "Rule 2",
]
input_placeholder = "Transcript: {text}"

Grammar/Auto-fix Prompt:

[prompts.grammar]
instruction = "Your instructions here."
rules = [
    "Rule 1",
    "Rule 2",
]
input_placeholder = "Text: {text}"

Configuration Options

instruction: Main task description for the AI
formatting_rules (for summarize): List of bullet points for formatting
rules (for grammar): Numbered list of processing rules
input_placeholder: Template for input text ({text} is replaced with transcript)

Available Structural Commands

begin list / end list - Bullet lists
begin sublist / end sublist - Indented sub-items
begin numbered list / end numbered list - Ordered lists
new paragraph - Paragraph break
line break - Single line break
heading level one/two/three - Markdown headings
begin quote / end quote - Blockquotes
begin code / end code - Fenced code blocks

Edit prompts.toml directly - changes apply on next run.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.github/workflows		.github/workflows
backend		backend
docs		docs
tests		tests
vendor		vendor
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
README.md		README.md
brainstorm.sh		brainstorm.sh
cli.py		cli.py
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
mkdocs.yml		mkdocs.yml
personal_names.txt		personal_names.txt
process_audio_brainstorm.sh		process_audio_brainstorm.sh
prompts.toml		prompts.toml
pyproject.toml		pyproject.toml
report.sh		report.sh
setup.sh		setup.sh
todo.md		todo.md
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local-AI-transcription

Setup

AI Service Configuration

Usage

CLI Usage

Basic Commands

Options

Examples

Features

Customizing Prompts

Example Modes

Configuration Options

Available Structural Commands

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Local-AI-transcription

Setup

AI Service Configuration

Usage

CLI Usage

Basic Commands

Options

Examples

Features

Customizing Prompts

Example Modes

Configuration Options

Available Structural Commands

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages