Skip to content

tehw0lf/markdown_transcription_system

Repository files navigation

Universal Markdown Audio Transcription System

A professional, completely local and private audio transcription system that works with any markdown-based note-taking application. Transform your audio recordings into searchable markdown transcripts without sending your data to the cloud.

πŸš€ Key Features

  • πŸ”’ 100% Local & Private - Audio never leaves your machine
  • πŸ’° Zero Ongoing Costs - No API keys or subscription fees
  • πŸ“± Universal Compatibility - Works with Obsidian, Logseq, Foam, Zettlr, and any markdown system
  • πŸ›‘οΈ Security-First - External script approach, no plugins required
  • πŸ“„ Template-Driven - Customizable output formats
  • 🎯 Smart Linking - Automatically adds transcript links to existing notes
  • ⚑ Batch Processing - Handle multiple files efficiently
  • 🎨 Configurable - JSON/YAML configuration with multiple profiles

πŸ“‹ Quick Start

1. Install Dependencies

# Install system dependencies (required: ffmpeg for audio processing)
# Ubuntu/Debian:
sudo apt update && sudo apt install ffmpeg

# macOS with Homebrew:
brew install ffmpeg

# Arch Linux:
sudo pacman -S ffmpeg

# Install Python dependencies using UV (recommended)
uv sync

# Alternative: Use the installation script
chmod +x scripts/install.sh && ./scripts/install.sh

2. Create Configuration

# Create example configuration for your markdown system
uv run python -m src.transcription_system --create-config config.yaml --config-type obsidian

# Or create for other systems
uv run python -m src.transcription_system --create-config config.yaml --config-type logseq
uv run python -m src.transcription_system --create-config config.yaml --config-type foam

3. Edit Configuration

Edit config.yaml to match your setup:

# Basic configuration
vault_path: "/path/to/your/notes"
audio_folder_name: "Audio"
transcripts_folder_name: "Audio-Transcripts"

# Whisper settings
whisper_model: "medium"  # tiny, base, small, medium, large
language: "auto"         # or specify: en, de, fr, es, etc.

# Link format (adjust for your markdown system)
link_format_style: "wikilink"  # wikilink, standard, or custom
link_format_prefix: "πŸ“ **Transcript:**"

4. Test the System (Recommended)

# Run comprehensive test suite (safe - creates isolated test environment)
./test_system.sh

# The test will:
# - Create a temporary test environment using uv
# - Test all system components safely
# - Generate a detailed test report
# - Optionally test actual transcription with sample audio

5. Run Transcription

# Run with your configuration
uv run python -m src.transcription_system --config config.yaml

πŸ§ͺ Testing

Before using the system on your actual files, it's highly recommended to run the test suite:

Comprehensive Test Suite

The included test script provides safe, isolated testing:

# Make the test script executable (if not already)
chmod +x test_system.sh

# Run the test suite
./test_system.sh

What the test does:

  • βœ… Creates isolated test directory with timestamp
  • βœ… Uses uv for clean virtual environment
  • βœ… Tests all system components without affecting your files
  • βœ… Creates sample audio and markdown files for testing
  • βœ… Generates detailed test report
  • βœ… Optional real transcription test with tiny model

Test Components:

  1. Import Tests - Verifies code loads correctly
  2. Configuration Tests - Tests config creation and validation
  3. System Tests - Tests main system initialization
  4. Template Tests - Tests template loading
  5. File Discovery - Tests finding audio files and notes
  6. Integration Tests - Tests complete workflow
  7. Optional Transcription - Real transcription with sample audio

Manual Testing

If you prefer manual testing:

# Test configuration creation
python -m src.transcription_system --create-config test-config.yaml --config-type obsidian

# Test with dry-run on a copy of your vault
cp -r /path/to/your/vault /tmp/test-vault
# Edit test-config.yaml to point to /tmp/test-vault
python -m src.transcription_system --config test-config.yaml

πŸ”§ Installation

Option 1: Direct Installation

# Clone the repository
git clone https://github.com/yourusername/markdown-audio-transcription.git
cd markdown-audio-transcription

# Install dependencies
pip install -r requirements.txt

# Install Whisper and system dependencies
# Ubuntu/Debian (recommended):
sudo apt install python-openai-whisper ffmpeg

# macOS:
brew install ffmpeg
pip install --global openai-whisper

# Windows:
# Download ffmpeg from https://ffmpeg.org/download.html
# pip install --global openai-whisper

Option 2: Using the Install Script

# Run the installation script
chmod +x scripts/install.sh
./scripts/install.sh

πŸ“Š Supported Markdown Systems

Obsidian

  • Link Format: [[transcript_name]]
  • Audio Folder: Audio/
  • Transcript Folder: Audio-Transcripts/

Logseq

  • Link Format: [[transcript_name]]
  • Audio Folder: assets/
  • Transcript Folder: transcripts/

Foam (VS Code)

  • Link Format: [transcript_name](transcript_name.md)
  • Audio Folder: attachments/
  • Transcript Folder: transcripts/

Zettlr

  • Link Format: [[transcript_name]]
  • Audio Folder: media/
  • Transcript Folder: transcripts/

Generic Markdown

  • Link Format: [transcript_name](transcript_name.md)
  • Audio Folder: media/
  • Transcript Folder: transcripts/

πŸŽ›οΈ Configuration Options

Basic Settings

# Path to your notes/vault
vault_path: "/home/user/Notes"

# Folder names (relative to vault_path)
audio_folder_name: "Audio"
transcripts_folder_name: "Audio-Transcripts"

# Whisper AI settings
whisper_model: "medium"    # Model size affects accuracy vs speed
language: "auto"           # Auto-detect or specify (en, de, fr, etc.)

# Processing options
auto_move_files: true      # Move processed files to audio folder
create_timestamps: true    # Include detailed timestamps
skip_existing_transcripts: true  # Skip files that already have transcripts
recursive_search: true     # Search subdirectories for audio files

Advanced Settings

# Link format customization
link_format_style: "wikilink"  # wikilink, standard, or custom
link_format_prefix: "πŸ“ **Transcript:**"

# File extensions to process
audio_extensions: [".mp3", ".wav", ".m4a", ".flac", ".ogg", ".aac"]
video_extensions: [".mp4", ".mkv", ".avi", ".mov", ".wmv", ".webm"]

# Logging configuration
log_level: "INFO"          # DEBUG, INFO, WARNING, ERROR
console_logging: true      # Log to console
file_logging: true         # Log to file
log_file: "/var/log/markdown-transcription.log"

# System settings
temp_dir: "/tmp"
lock_file: "/var/lock/markdown-transcription.lock"
encoding: "utf-8"

🎨 Template Customization

Transcript Template

Edit templates/transcript-template.md:

# Transcription: {filename}

**File:** `{filename}`  
**Date:** {date}  
**Original Location:** `{audio_folder}/{filename}`

## Transcript

{transcript_content}

## Detailed Timestamps

{timestamp_content}

Link Template

Edit templates/link-template.md:

πŸ“ **Transcript:** [[{audio_name}_transcript]]

πŸ› οΈ Usage Examples

Basic Usage

# Process all audio files in your vault
python -m src.transcription_system --config config.yaml

Create Configurations

# Create Obsidian configuration
python -m src.transcription_system --create-config obsidian-config.yaml --config-type obsidian

# Create Logseq configuration
python -m src.transcription_system --create-config logseq-config.yaml --config-type logseq

# Create generic markdown configuration
python -m src.transcription_system --create-config generic-config.yaml --config-type generic

Multiple Vaults

# Process different vaults with different configurations
python -m src.transcription_system --config work-vault.yaml
python -m src.transcription_system --config personal-vault.yaml

πŸ”„ Automation

Systemd Service (Linux)

Create automatic processing on file changes:

# Set up systemd service
sudo chmod +x scripts/setup-systemd.sh
sudo ./scripts/setup-systemd.sh

# Enable and start service
sudo systemctl enable markdown-transcription
sudo systemctl start markdown-transcription

Cron Job

Process files periodically:

# Add to crontab (process every 30 minutes)
*/30 * * * * /usr/bin/python3 /path/to/transcription_system.py --config /path/to/config.yaml

Directory Watching

Use with file system watchers like inotify:

# Watch for new audio files and process automatically
inotifywait -m -e create --format '%w%f' /path/to/vault/ | while read file; do
    if [[ $file =~ \.(mp3|wav|m4a)$ ]]; then
        python -m src.transcription_system --config config.yaml
    fi
done

πŸ†š Comparison with Alternatives

Feature This System Whisper.cpp Commercial APIs
Privacy βœ… 100% Local βœ… 100% Local ❌ Cloud-based
Cost βœ… Free βœ… Free ❌ Pay-per-use
Markdown Integration βœ… Native ❌ Manual ❌ Manual
Template System βœ… Built-in ❌ None ❌ None
Auto-linking βœ… Automatic ❌ Manual ❌ Manual
Multi-app Support βœ… Universal ❌ Generic ❌ Generic
Batch Processing βœ… Yes βœ… Yes βœ… Yes
Accuracy βœ… High (Whisper) βœ… High (Whisper) βœ… High

πŸ“ˆ Performance & Models

Whisper Model Comparison

Model Speed Accuracy VRAM Usage Best For
tiny Fastest Good ~1GB Quick processing
base Fast Better ~1GB Balanced performance
small Medium Good ~2GB Most use cases
medium Slower Better ~5GB High accuracy needed
large Slowest Best ~10GB Maximum accuracy

Processing Times (Approximate)

  • 10-minute audio file:
    • tiny: ~30 seconds
    • base: ~1 minute
    • small: ~2 minutes
    • medium: ~4 minutes
    • large: ~8 minutes

πŸ” Troubleshooting

Common Issues

1. "Whisper is not installed" error

# Method 1 - Package manager (recommended for Ubuntu/Debian):
sudo apt install python-openai-whisper

# Method 2 - Global pip installation:
pip install --global openai-whisper

2. "ffmpeg not found" error

# Ubuntu/Debian
sudo apt install ffmpeg

# macOS
brew install ffmpeg

# Windows: Download from https://ffmpeg.org/

3. "Permission denied" errors

# Fix file permissions
chmod +x scripts/*.sh
sudo chown -R $USER:$USER /path/to/vault

4. "Another instance is already running"

# Remove lock file if stale
sudo rm /var/lock/markdown-transcription.lock

5. Template not found errors

# Ensure templates directory exists
mkdir -p templates/
# Copy default templates from repository

Performance Issues

Slow transcription:

  • Use a smaller model (tiny, base, small)
  • Close other applications to free up RAM/VRAM
  • Consider using CPU-only mode for older hardware

High memory usage:

  • Use smaller model
  • Process files one at a time
  • Increase system swap space

🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

# Clone the repository
git clone https://github.com/yourusername/markdown-audio-transcription.git
cd markdown-audio-transcription

# Install development dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt

# Run comprehensive test suite
./test_system.sh

# Run linting
flake8 src/
black src/

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • OpenAI Whisper for the excellent speech recognition model
  • The markdown note-taking community for inspiration and feedback
  • All contributors who helped improve this system

πŸ“ž Support

  • Issues: Report bugs and request features on GitHub Issues
  • Discussions: Ask questions and share ideas in GitHub Discussions
  • Documentation: Find detailed guides in the docs/ directory

Made with ❀️ for the markdown note-taking community

About

Automated audio transcription system for markdown note-taking applications like Obsidian, Logseq, and Foam

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors