Skip to content

danort92/bird-acoustics-classifier

Repository files navigation

Bird Acoustics Classifier

Python PyTorch EfficientNet librosa Gradio MLflow Xeno--canto Species License

Automatic bird species recognition from audio recordings using EfficientNet-B0 fine-tuning on mel spectrograms, with data sourced from the Xeno-canto API.

Target habitat: Alpine zone (Italy / Austria / Switzerland) — 20 characteristic species:

Species Common name Habitat
Turdus torquatus Ring ouzel Rocky slopes, high-altitude forests
Phoenicurus ochruros Black redstart Rocky terrain, mountain villages
Prunella collaris Alpine accentor High rocky areas above treeline
Pyrrhocorax graculus Yellow-billed chough Alpine cliffs and glaciers
Pyrrhocorax pyrrhocorax Red-billed chough Alpine meadows and cliffs
Tichodroma muraria Wallcreeper Vertical rock faces
Anthus spinoletta Water pipit Alpine meadows and streams
Montifringilla nivalis White-winged snowfinch Above treeline, snowfields
Lagopus muta Rock ptarmigan High alpine tundra
Dryocopus martius Black woodpecker Subalpine conifer forests
Tetrao urogallus Western capercaillie Old-growth conifer forests
Picoides tridactylus Three-toed woodpecker Spruce forests
Loxia curvirostra Common crossbill Conifer forests
Nucifraga caryocatactes Spotted nutcracker Mountain conifer forests
Regulus ignicapilla Firecrest Mixed mountain forests
Cinclus cinclus White-throated dipper Alpine streams and torrents
Ficedula albicollis Collared flycatcher Deciduous mountain forests
Saxicola rubetra Whinchat Subalpine meadows
Emberiza cia Rock bunting Rocky slopes with sparse vegetation
Gypaetus barbatus Bearded vulture High alpine cliffs (reintroduced)

Why 20 species — and how to add more

Rationale for the current selection

The 20 species were chosen based on two criteria:

  1. Ecological coherence — all are characteristic of the Alpine zone (Italy / Austria / Switzerland), making the classifier useful for a single, well-defined habitat.
  2. Compute budget — with max_per_species: 100 recordings and 30 training epochs on a single consumer GPU (or Google Colab free tier), the full pipeline completes in roughly 2–3 hours. Scaling to more species increases download size, preprocessing time, and training time roughly linearly.

On a machine without a GPU, training 20 species for 30 epochs already takes several hours. Adding more species without access to a dedicated GPU or cloud accelerator is feasible but slow.

Adding more species

The entire pipeline is driven by the species list in config/default.yaml — no code changes are needed.

Step 1 — Find valid species names

Use the scientific name exactly as it appears on Xeno-canto. Search the site to verify that enough recordings exist (aim for at least 30–50 per species).

Step 2 — Edit the config

# config/default.yaml
species:
  - Turdus torquatus
  - Cinclus cinclus
  # ... existing 18 species ...
  - Aquila chrysaetos       # Golden eagle  ← add new species here
  - Tetrao tetrix           # Black grouse

Step 3 — Re-run the pipeline

# Download recordings only for the new species (faster)
python scripts/download.py --species "Aquila chrysaetos" "Tetrao tetrix" --max 100

# Or re-download everything from scratch
python scripts/download.py

# Regenerate spectrograms (skip existing ones automatically)
python scripts/preprocess.py

# Retrain — the model head is rebuilt to match the new number of classes
python scripts/train.py

# Launch the updated demo
python app/app.py

train.py rebuilds the EfficientNet-B0 classification head automatically to match the number of species found in data/processed/. You do not need to edit any code — only the YAML.

Practical limits (rough estimates)

Species ~Audio files ~Preprocessing ~Training (GPU) ~Training (CPU only)
20 2 000 20 min 1–2 h 4–8 h
50 5 000 45 min 3–5 h 12–20 h
100 10 000 1.5 h 6–10 h 30–50 h

For large expansions, consider reducing max_per_species (e.g. 50) or increasing batch_size and using a cloud GPU (Colab, Kaggle, Vast.ai).


Results (reference run)

Configuration: max_per_species: 20, 30 epochs, seed 42, EfficientNet-B0 fine-tuned on grade-A Xeno-canto recordings. Results will vary when retraining with a different dataset or additional species.

Test accuracy: 96.6% — macro F1: 0.956 — weighted F1: 0.965

Species Precision Recall F1 Support
Anthus spinoletta 1.000 0.667 0.800 6
Cinclus cinclus 1.000 0.932 0.965 59
Dryocopus martius 1.000 0.977 0.989 44
Emberiza cia 1.000 1.000 1.000 21
Ficedula albicollis 0.981 1.000 0.991 53
Gypaetus barbatus 1.000 1.000 1.000 11
Lagopus muta 0.939 1.000 0.969 31
Loxia curvirostra 0.871 0.964 0.915 28
Montifringilla nivalis 1.000 0.950 0.974 20
Nucifraga caryocatactes 1.000 0.909 0.952 22
Phoenicurus ochruros 0.982 0.965 0.973 226
Picoides tridactylus 1.000 1.000 1.000 42
Prunella collaris 0.928 0.928 0.928 69
Pyrrhocorax graculus 0.942 0.951 0.946 102
Pyrrhocorax pyrrhocorax 0.899 0.969 0.932 64
Regulus ignicapilla 0.957 0.957 0.957 23
Saxicola rubetra 0.973 0.935 0.954 77
Tetrao urogallus 0.992 0.992 0.992 122
Tichodroma muraria 0.923 0.923 0.923 13
Turdus torquatus 0.958 0.976 0.967 254

Notable confusions: Anthus spinoletta (only 6 test samples — low-support species are inherently noisier); Nucifraga caryocatactes occasionally confused with Pyrrhocorax species (similar alpine habitat); the two Pyrrhocorax species (graculus / pyrrhocorax) show minor cross-confusion as expected given acoustic similarity.

Confusion matrix Training curves


Pipeline

Step Module Notebook CLI script
1. Download audio src/download.py pipeline.ipynb scripts/download.py
2. Audio → mel spectrograms src/preprocessing.py pipeline.ipynb scripts/preprocess.py
3. Train EfficientNet-B0 src/model.py pipeline.ipynb scripts/train.py
4. Evaluate & track metrics src/model.py pipeline.ipynb scripts/evaluate.py
5. Interactive demo app/app.py python app/app.py

Project structure

bird-acoustics-classifier/
├── config/
│   └── default.yaml        # centralised config (species, audio, training, mlflow)
├── data/
│   ├── raw/                # .mp3 recordings from Xeno-canto (per species)
│   └── processed/          # mel spectrogram .png tiles (per species)
├── models/                 # saved model checkpoints (best_model.pt)
├── notebooks/
│   └── pipeline.ipynb      # full pipeline: download → preprocessing → training → evaluation
├── outputs/                # training artefacts (loss curves, confusion matrices)
├── reports/                # evaluation reports and plots
├── scripts/                # CLI entry points (terminal-friendly alternatives to notebook)
│   ├── download.py
│   ├── preprocess.py
│   ├── train.py
│   ├── evaluate.py
│   └── infer.py
├── src/                    # reusable Python modules
│   ├── download.py         # Xeno-canto API downloader
│   ├── preprocessing.py    # audio → mel spectrogram converter
│   └── model.py            # EfficientNet-B0, BirdTrainer, inference helpers
├── app/
│   └── app.py              # Gradio web interface
├── tests/
│   ├── test_download.py
│   └── test_preprocessing.py
└── requirements.txt

Installation

git clone https://github.com/danort92/bird-acoustics-classifier.git
cd bird-acoustics-classifier
pip install -r requirements.txt

Set your Xeno-canto API key (required since October 2025):

export XENO_CANTO_API_KEY="your_api_key_here"

Get a free key at https://xeno-canto.org/article/854 after registering. If not set, the downloader will prompt interactively.


Quick start

1 — Download recordings

from src.download import XenoCantoDownloader

dl = XenoCantoDownloader(output_dir="data/raw")

# Grade-A only (cleanest recordings)
dl.download_species(["Turdus torquatus", "Cinclus cinclus"], max_per_species=50)

# Mixed quality — improves robustness on real-world recordings
dl.download_species(
    ["Turdus torquatus", "Cinclus cinclus"],
    max_per_species=100,
    quality_mix={"A": 60, "B": 30, "C": 10},
)

Or via CLI:

# All species in config/default.yaml
python scripts/download.py

# Custom species list
python scripts/download.py --species "Turdus torquatus" "Cinclus cinclus" --max 50

2 — Generate mel spectrograms

from src.preprocessing import SpectrogramConverter, AudioConfig

conv = SpectrogramConverter(output_dir="data/processed")
conv.process_all(input_dir="data/raw")

Or via CLI:

python scripts/preprocess.py          # uses config/default.yaml
python scripts/preprocess.py --overwrite   # overwrite existing PNGs

3 — Train the model

from src.model import BirdTrainer, TrainingConfig

cfg     = TrainingConfig.from_yaml()   # reads config/default.yaml
trainer = BirdTrainer(cfg)
best_path, history = trainer.train()   # saves models/best_model.pt

Or via CLI:

python scripts/train.py
python scripts/train.py --epochs 50 --batch-size 64 --lr 5e-4

Training logs per-epoch loss/accuracy to the console and to MLflow. The best checkpoint (lowest val loss) is saved to models/best_model.pt.

4 — Evaluate

from src.model import BirdTrainer, TrainingConfig

cfg     = TrainingConfig.from_yaml()
trainer = BirdTrainer(cfg)
y_true, y_pred = trainer.evaluate("models/best_model.pt")

Or via CLI:

python scripts/evaluate.py --checkpoint models/best_model.pt

5 — Interactive demo (Gradio)

python app/app.py

Then open http://localhost:7860 in your browser.

Options:

python app/app.py --checkpoint models/best_model.pt   # custom checkpoint
python app/app.py --port 8080                         # custom port
python app/app.py --share                             # public Gradio link

The app accepts .mp3 or .wav files (or a .zip archive), slices them into 5-second clips, runs the model on each clip, and returns the best species prediction with confidence score, plus the mel spectrogram of the first clip.

The Settings panel in the UI includes a Model checkpoint dropdown that automatically discovers all .pt files in the models/ directory — no restart needed to switch between checkpoints.


Training from scratch

No pre-trained checkpoint is required. The entire pipeline — from raw audio to a ready-to-use model — runs automatically with the commands above. There is nothing to upload manually.

The sequence is:

API key → download .mp3 → generate spectrograms → train → models/best_model.pt → Gradio app

scripts/train.py calls BirdTrainer.train(), which automatically saves the best checkpoint (lowest validation loss) to models/best_model.pt at the end of training. The Gradio app reads that file by default, and it will appear automatically in the UI checkpoint dropdown.

To retrain from scratch:

python scripts/train.py          # overwrites models/best_model.pt
python app/app.py                # now uses your freshly trained model

Notebook

Notebook Description Colab
pipeline.ipynb Full pipeline: download → preprocessing → training → evaluation Open In Colab

Running locally

pip install -r requirements.txt
jupyter notebook notebooks/pipeline.ipynb

Google Colab

Open the badge above, then Runtime → Run all. The setup cell clones the repo, installs dependencies, and symlinks data/raw and data/processed to Google Drive so files survive session restarts.


Configuration

All parameters live in config/default.yaml. Edit it to change species, audio settings, or training hyperparameters without touching the code:

species:
  - Turdus torquatus
  - Cinclus cinclus
  # ... 18 more

download:
  max_per_species: 100
  quality: "A"            # grade filter when quality_mix is not set (A–E)
  # quality_mix:          # blend of grades — weights are relative, not absolute counts
  #   A: 60               # ~60 % grade-A
  #   B: 30               # ~30 % grade-B
  #   C: 10               # ~10 % grade-C
  countries: []           # e.g. ["Italy", "Austria"] — empty = worldwide

audio:
  sample_rate: 22050
  clip_duration: 5.0      # seconds per spectrogram tile
  n_mels: 128
  n_fft: 2048
  hop_length: 512
  f_min: 500.0            # Hz — filters wind/traffic noise
  f_max: 15000.0          # Hz
  top_db: 80.0            # log-amplitude dynamic range
  img_size: [224, 224]    # matches EfficientNet input

training:
  model: efficientnet_b0
  batch_size: 32
  epochs: 30
  learning_rate: 0.001
  val_split: 0.15
  test_split: 0.15
  seed: 42
  patience: 7             # early stopping

Experiment tracking (MLflow)

By default MLflow logs to a local mlruns/ folder. To use DagsHub (free remote tracking):

  1. Create a free account at https://dagshub.com and connect this repository.
  2. Export the following variables before training:
export MLFLOW_TRACKING_URI=https://dagshub.com/<username>/bird-acoustics-classifier.mlflow
export MLFLOW_TRACKING_USERNAME=<your-dagshub-username>
export MLFLOW_TRACKING_PASSWORD=<your-dagshub-token>

No code change needed — the env var overrides the tracking_uri in the config.

To browse the local UI:

mlflow ui
# open http://localhost:5000

Technologies

Library Role
PyTorch / TorchVision EfficientNet-B0 training and fine-tuning
Librosa Audio loading and mel spectrogram computation
Gradio Interactive web demo
MLflow Experiment tracking and checkpoint logging
Xeno-canto API v3 Bird song audio dataset
scikit-learn Stratified splits, evaluation metrics
soundfile Audio file I/O backend for Librosa (WAV/FLAC/OGG)
Pillow / NumPy Image handling and array operations

About

Automatic bird species recognition from audio — mel spectrogram + EfficientNet-B0. 20 alpine species, 96.5% accuracy. Xeno-canto API · librosa · MLflow · Gradio app

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors