This project focuses on developing and evaluating a robust CAPTCHA recognition model using CRNN (Convolutional Recurrent Neural Network) and CTC (Connectionist Temporal Classification) loss. The main objective is to evaluate different data augmentation strategies and hyperparameter configurations to improve the model's robustness and generalization.
- PyTorch for model implementation and training
- CTC Loss for sequence-level supervision without character alignment
- Data Augmentation with custom pipelines for realistic CAPTCHA distortions
- Grid Search for hyperparameter tuning across multiple training scenarios
- Modular augmentation system with full control over geometric, color, blur, noise, and distractor-line parameters
- YAML-based configuration for tuning multiple parameters at once
- Automated logging and checkpointing per trial
- Built-in analysis suite for visualizing learning curves, error samples, confidence scores, and character distribution
You can launch training or hyperparameter tuning with:
python main.py --config configs/tuning_M.yaml- Replace
tuning_M.yamlwithtuning_S.yamlortuning_L.yamldepending on the scale of your experiment. - The results (checkpoints, logs, final models) will be saved under
outputs/.
- The core training loop is implemented in
trainer/train.py. - Hyperparameter tuning is orchestrated by
trainer/tuner.py, which performs grid search:- All combinations of specified parameters (
batch_size,learning_rate,epochs,optimizer) are iterated automatically. - Each trial is run independently and logs its own history, checkpoints, and final model.
- The best model is determined by Validation LER (Levenshtein Error Rate).
- All combinations of specified parameters (
All experiments are configured via YAML files under configs/.
Key parameters include:
-
Dataset paths
train_root,val_root,test_rootlabel_path→ JSON file with captcha labelsoriginal_image_dir→ un-augmented training images
-
Tuning settings
batch_size→ list of batch sizes to try (e.g.[4, 8, 16])learning_rate→ learning rates in scientific notation (e.g.[1e-3, 5e-4, 1e-4])epochs→ training epochs per trialoptimizer→ choice of optimizer (adam,adamw,sgd)seed→ random seed for reproducibilityctc_blank_index→ blank token index for CTC loss
-
Augmentation scenarios
- Multiple named augmentation configurations can be defined.
- Parameters:
angle_range,shear_range,brightness_range,contrast_range,noise_std,blur_probability,blur_radius,lines_probability,line_count,line_thickness.
-
Output directories
checkpoint_dir,final_model_dir,history_dirlog_file_path
-
Post-analysis
analysis_scripts→ list of visualization/analysis tasks to run automatically after tuning.
With this design, you can easily add new configs or augmentation scenarios without modifying the training code.
After training and hyperparameter tuning, this project provides automatic post-analysis scripts to help evaluate and visualize the results.
These scripts run either manually (via python analysis/run_all_analysis.py ...) or automatically if specified in the config file.
- The config file (
configs/*.yaml) can specify which analysis scripts to run under theanalysis_scriptssection. - After tuning,
tuner.pywill call these scripts automatically if they are listed.
All analysis utilities are stored in the analysis/ folder. Each script serves a specific purpose:
- plot_curves.py: Plots training loss and validation LER curves for each trial.
- compare_trials.py: Compares final LER across all trials in a bar chart.
- plot_charset_freq.py: Analyzes and plots character frequency in training labels.
- plot_prediction_dist.py: Analyzes distribution of predicted characters across the test set.
These tools help debug model behavior, compare augmentation effects, and ensure the model generalizes well.
All experiment results are saved under the outputs/ directory.
This folder is automatically created during training and organized into subfolders for clarity.
- checkpoints/ → Per-epoch saved states for resuming training or debugging.
- models/ → Final trained model and best-performing model per trial.
- logs/ → Detailed training histories (
train_loss,val_ler) in JSON format. - training_log.txt → Human-readable log with trial hyperparameters and validation metrics.
- predictions/ → Saved prediction results in JSON format for test evaluation.
- plots/ → Generated analysis figures.
This project can also be run in Google Colab by using main.ipynb, making it easy to experiment without setting up a local environment.
Two approaches are supported:
You can manually upload both the dataset and project folder as zip archives.
-
Upload your dataset (
part2.zip) which must contain:part2/ ├── train/ ├── val/ └── test/ -
Upload the project code (
captcha-cracker.zip) containing:captcha-cracker/ ├── main.py ├── configs/ ├── trainer/ └── ... -
Use the provided Colab setup script to extract the files:
import os, zipfile # Extract dataset with zipfile.ZipFile("part2.zip", 'r') as zip_ref: zip_ref.extractall("data") # Extract project with zipfile.ZipFile("captcha-cracker.zip", 'r') as zip_ref: zip_ref.extractall("captcha-cracker") %cd captcha-cracker
-
Install dependencies and run training:
!pip install -r requirements.txt !python main.py --config configs/tuning_M.yaml
Instead of uploading the zipped project folder, you can directly clone the repository:
!git clone https://github.com/captcha-cracker.git
%cd captcha-cracker
!pip install -r requirements.txt
!python main.py --config configs/tuning_M.yaml