This repository showcases a complete pipeline for high-quality Image Sharpening using Knowledge Distillation (KD). A pretrained Restormer model acts as the high-capacity teacher, while a lightweight Mini-UNet model is trained as the student to match the teacher’s performance.
The pipeline is designed for efficient deployment in real-world settings, where high-resolution image restoration is needed but hardware resources are limited.
Traditional motion deblurring models either provide high accuracy but are computationally heavy (like Restormer), or are lightweight but lose quality (like basic U-Nets). Our solution combines the best of both:
- A Restormer teacher model generates high-quality sharp outputs from blurry images.
- A compact Mini-UNet student learns not only from ground truth (sharp images), but also from the teacher’s output — using:
- L1 Loss for pixel accuracy
- Distillation Loss to mimic teacher behavior
- VGG Perceptual Loss to preserve visual quality
The student is trained to match the teacher’s performance, while being up to 10× smaller and much faster — making it ideal for mobile and edge devices.
- Uses Restormer as a powerful pretrained teacher model
- Student model is a custom Mini-UNet, 3–4 layers deep, <30 MB
- Incorporates multi-loss training (L1 + KD + Perceptual)
- Based on DIV2K dataset with blurry/sharp pairs
- Full training, inference, and SSIM evaluation done in Google Colab
- Resumable training with checkpoint support
- Achieves strong SSIM (~0.90+) with high visual fidelity
A complete walkthrough video of our Google Colab project is available for visual understanding of the full pipeline, code, and results.
Covers dataset setup, training, inference, and SSIM evaluation.
ImageSharpening-KD-Restormer-UNet/
├── models/ # Student model architecture
│ ├── student_model_unet.py
│ └── README.md
├── pretrained_models/ # Pretrained Restormer model (teacher)
│ ├── motion_deblurring.pth
│ ├── restormer_arch.py
│ └── README.md
├── losses/ # Custom loss functions
│ ├── vgg_loss.py
│ └── README.md
├── training/ # Training pipeline
│ ├── train_student_kd.py
│ ├── Student-Model-Training-Process.png
│ └── README.md
├── checkpoints/ # rained weights and resume checkpoints
│ ├── student_model_v1.pth ← (L1 + KD)
│ ├── student_model_v2.pth ← (Final: L1 + KD + VGG)
│ ├── student_checkpoint.pth
│ └── README.md
├── results/ # Benchmark evaluation results
│ ├── student_ssim_scores.csv
│ ├── highest.png
│ ├── midrange.png
│ └── README.md
├── data/ # [Excluded from repo, see Drive link]
│ ├── whole_dataset/
│ ├── blurry/
│ │ ├── train/train/
│ │ ├── train/test/
│ │ └── benchmark/
│ └── sharp/
│ ├── train/train/
│ ├── train/test/
│ └── benchmark/
│ └── README.md
├── ISKD - RESTORMER.ipynb # Main project notebook (end-to-end)
├── LICENSE # MIT License
├── requirements.txt (optional) # Pip dependencies
└── README.md # Full documentation
We use the DIV2K dataset (high-resolution image dataset) as the base for training and evaluation. The dataset is organized into paired blurry–sharp images for both training and benchmarking, with a strict triplet alignment to support knowledge distillation:
- Patch Size: 512×512 (non-overlapping, center-cropped if needed)
- Total Training Images (Patches): ~20,000 triplets
- Benchmark Pairs: ~100 full-size images for final SSIM evaluation
- Sharpness Degradation: Blur is synthetically added using downscale-upscale + motion blur for realism
- Triplet Matching: All blurry/sharp/teacher images are aligned by filename
📎 More Details: Refer to the data/README.md for exact dataset preparation steps, patching logic, and source download references.
This project focuses on a lightweight student network (Mini U-Net) trained using a knowledge distillation (KD) framework. We leverage a powerful Restormer model as the teacher to guide and supervise the training of the student. The student learns not just from the ground truth (sharp images), but also from the intermediate guidance of the teacher's outputs. This results in improved sharpness, structural accuracy, and generalization—all while reducing model size and inference cost.
| Component | Description |
|---|---|
| Teacher Model | Pretrained Restormer (Motion Deblurring model) |
| Student Model | Mini U-Net (2–3 encoding–decoding blocks with skip connections) |
| Distillation Type | Output-based regression (pixel-wise teacher output) |
| Input Patch Size | 512×512 patches |
| Training Loss | Total Loss = L1 + λ_kd * Distillation Loss + λ_vgg * Perceptual Loss |
| Loss Type | Description | Weight |
|---|---|---|
| L1 Loss | Measures pixel-wise difference between student output and ground truth sharp image | 1.0 |
| Distillation Loss | Measures difference between student output and teacher (Restormer) output | 1.0 |
| VGG Perceptual Loss | Computes high-level feature distance between student output and ground truth (using VGG16) | 0.1 |
All losses are computed on full-resolution patches and combined during training.
- Performance: Mimicking a strong teacher helps the student learn refined deblurring patterns
- Efficiency: Enables real-time image sharpening on resource-constrained devices (e.g., mobile)
- Generalization: The student learns smoother and more perceptually aligned reconstructions
📎 See: training/train_student_kd.py for full implementation of the KD + VGG training pipeline.
The student model in this project is a highly efficient, custom Mini U-Net, specifically crafted to mimic the output of a large teacher (Restormer) while remaining fast and lightweight. It uses a contracting path (encoder) to capture context and an expanding path (decoder) for precise localization — a hallmark of the U-Net architecture.
This student network was chosen because:
- It supports sharpness recovery with skip connections
- Is fully convolutional (works on any image size)
- Can be easily downscaled or upscaled for faster or more accurate variants
| Stage | Details |
|---|---|
| Input | 3-channel RGB blurry image |
| Encoder | 2 or 3 levels of: Conv → BN → ReLU → Conv → BN → ReLU → MaxPool |
| Bottleneck | Deepest layer with high-level feature representation |
| Decoder | Transposed Conv → Concatenate skip → Conv → BN → ReLU → Conv → BN → ReLU |
| Output | Final 1×1 Conv to restore 3-channel sharpened image |
The diagram below illustrates the full training flow:
| Component | Description |
|---|---|
| ConvBlock | Two convolutional layers, each followed by BatchNorm and ReLU activation. |
| Dropout (optional) | Configurable dropout between layers (disabled by default). |
| MaxPooling | Used for downsampling in the encoder. |
| Transposed Conv | Used for upsampling in the decoder (learnable). |
| Skip Connections | Preserve spatial detail by concatenating encoder features to decoder stages. |
| Final Layer | 1×1 convolution to map to RGB output. |
model = UNet(
base_filters=64, # Controls width of feature maps
use_dropout=False, # Dropout not used in final model
depth=4 # 3 or 4 levels depending on tradeoff
)| Parameter Scope | Details |
|---|---|
| Total Params | ~1.1M for depth=3 |
| Student Size | ~30 MB (trained weights, final) |
| Checkpoints | ~88 MB (includes optimizer state) |
- Full model code:
models/student_model_unet.py - Written in PyTorch
- Easily modifiable:
- Dropout support
- BatchNorm support
- Optional residual connections
| Strength | Reason |
|---|---|
| Lightweight | Easily deployable on edge/CPU |
| Modular | Flexible depth and width control |
| Effective | High SSIM (>0.90) via knowledge distillation |
| Interpretable | Intuitive encoder-decoder design with skip connections |
| Aspect | Teacher (Restormer) | Student (Mini U-Net) |
|---|---|---|
| Model Type | Restormer (Transformer-based) | Mini U-Net (CNN-based) |
| Pretrained | Yes (pretrained on GoPro Motion Deblurring dataset) | No (trained from scratch with KD) |
| Parameter Count | ~26 Million | ~1.1 Million (depth=3) |
| File Size | ~105 MB (pretrained .pth) | ~30 MB (student_model_v2.pth) |
| Architecture | Multi-stage, self-attention, feed-forward | 3-level encoder-decoder with skip connections |
| Training Input | Full blurry DIV2K images | 512×512 patches |
| Output | Sharp image (same size as input) | Sharp image (same size as input) |
| Inference Speed | Slow (high compute cost, not real-time on CPU) | Fast (real-time capable on CPU) |
| Purpose | Acts as ground truth proxy for student training | Learns to mimic teacher using L1 + KD + VGG loss |
Teacher Model Checkpoint: Pretrained Restormer weights
GitHub Link: Restormer official repo
Student Model File: student_model_unet.py
| Component | Details |
|---|---|
| Model | Mini U-Net (3-level) |
| Teacher | Pretrained Restormer (motion deblurring) |
| Patch Size | 512×512 |
| Loss Function | L1 Loss + KD Loss (to mimic teacher) + VGG Perceptual Loss (λ=0.1) |
| Epochs | 20 (or less based on GPU availability, resumed using checkpoints) |
| Batch Size | 8 |
| Training Style | Chunked training (2500 images/epoch) for efficiency on Colab |
| Script Used | training/train_student_kd.py |
| Checkpointing | Checkpoint saved after each epoch → resumable from last epoch |
Training was done in Google Colab using free-tier GPU and Drive integration.
| Step | Description |
|---|---|
| Model Used | Final trained student_model_v1.pth |
| Input Folder | /data/blurry/benchmark/ |
| Ground Truth Folder | /data/sharp/benchmark/ |
| Output Folder | /outputs/student_output/benchmark/ |
| Script Notebook | ISKD - RESTORMER.ipynb |
| Metric | SSIM computed using skimage.metrics.ssim on Y-channel (cropped borders) |
| Results Saved | CSV file: results/student_ssim_scores.csv |
| Visualization | Side-by-side images shown (blurry vs output vs sharp) for 3 samples |
Inference supports full-size images and automatically resumes if interrupted.
After training the Mini-UNet using L1 + KD + VGG loss, we evaluated the student model on benchmark patches from the DIV2K dataset.
- Input:
/data/blurry/benchmark/ - Ground Truth:
/data/sharp/benchmark/ - Student Output:
/outputs/student_output/benchmark/ - Notebook:
ISKD - RESTORMER.ipynb - Evaluation Script: Part of the Colab notebook (final SSIM evaluation cell)
- Metrics: SSIM (Structural Similarity Index) calculated on Y-channel (luminance)
| Metric | Score |
|---|---|
| Average SSIM (Blurry) | ~0.61 |
| Average SSIM (Student) | ~0.90 |
- Improvement: The student model shows a significant SSIM gain compared to the blurry input.
- High perceptual quality achieved with minimal model size.
The top and mid-level performing images were visually inspected and plotted using Matplotlib in the notebook.
- Top Images: SSIM ≥ 0.92
- Mid-Range: SSIM ≈ 0.85–0.89
- Poor Scores: Rare, only when input was severely degraded
A .csv file containing SSIM scores for all benchmark images is available:
results/student_ssim_scores.csv
- Columns:
ImageSSIM (Blurry)SSIM (Student)
- Useful for analysis, charting, or report submission
To make the project fully accessible and reproducible, we’ve shared the entire project directory on Google Drive.
📎 🔗 Access Full Project Folder on Drive
This includes:
- All Python source files (
.py) and scripts- Trained student models (
.pthfiles)- Pretrained teacher model weights (Restormer)
- Outputs from student and teacher models
- Full DIV2K dataset used in training/testing:
/data/whole_dataset/(original HR images)/data/blurry/train/train/,/test/,/benchmark//data/sharp/train/train/,/test/,/benchmark/- Patch-based triplet dataset used in KD training
- SSIM evaluation outputs (
.csv)- External image test samples and visual comparisons
- Inference outputs and logs
This Drive folder mirrors everything used in the Colab pipeline and repository structure, including additional resources not hosted directly on GitHub due to file size limitations.
Use it to:
- Run or resume training/inference
- Review model outputs and evaluation results
- Explore or replicate experiments
This project, focused on Image Sharpening using Knowledge Distillation, was developed as part of the Intel® Unnati Industrial Training Program 2025
Team Name: RestoraTech
| Member | Role | Contribution Summary |
|---|---|---|
| Dhruv Suthar | Team Lead & Primary Developer | Designed and implemented the full U-Net knowledge distillation pipeline with VGG perceptual loss, handled model training, evaluation, visualization, and GitHub structuring. |
| Pratham Patel | Evaluation & Dataset Lead | Led benchmark evaluation, SSIM analysis, visual output comparisons, and managed external image testing and dataset structuring. |
Alongside this main implementation using the Restormer teacher, we also explored a parallel approach using the SwinIR-M (x4 PSNR) model as the teacher network for knowledge distillation.
You can explore that version here:
🔗 ImageSharpening-KD-SwinIR-M-x4-PSNR
The Restormer-based approach, however, demonstrated stronger perceptual quality and faster inference, and is considered the finalized primary submission for this challenge.
This project presents a complete, lightweight, and high-quality image sharpening pipeline using Knowledge Distillation from a powerful Restormer teacher to a compact Mini-UNet student model. Despite being highly compressed (~1.1M parameters), the student model achieves SSIM ≥ 0.90, demonstrating strong perceptual performance and real-time usability.
With modular loss integration (L1, KD, VGG), checkpoint-based resumable training, and full inference/evaluation tools, this repository provides a fully reproducible and scalable framework for real-world sharpening tasks on blurred images.
- Restormer: Efficient Transformer for High-Resolution Image Restoration
- DIV2K Dataset – NTIRE Challenge
- SwinIR: Image Restoration Using Swin Transformer
- PyTorch Official Documentation
- Intel® Unnati Industrial Training Program
This project is released under the MIT License.
You are free to use, modify, and distribute this for academic and research purposes. Commercial use may require additional permission.


