A comprehensive YOLOv8-based object detection pipeline for training custom models and automating game interaction via servo control. Built for efficient real-time inference on game footage with automated labeling workflows.
โจ Complete ML Workflow
- ๐น Video-to-frames extraction and preprocessing
- ๐ท๏ธ Interactive web-based manual labeling (Streamlit)
- ๐ค Automatic object classification using template matching (SSIM)
- ๐ Incremental YOLOv8 model training with pre-trained weights
- ๐ Dataset quality review and cleaning tools
โจ Inference & Automation
- ๐ฅ Real-time camera-based object detection
- ๐ฎ Automated game control via servo/serial interface
- ๐๏ธ Manual servo control UI (Streamlit)
- ๐ Automatic Region of Interest (ROI) detection via pattern matching
โจ Key Technical Highlights
- Robust background subtraction for contour detection
- SSIM-based template matching for class identification (handles lighting variations)
- Multithreaded batch processing for fast labeling
- Hardware-aware coordinate transformations
- GPU-optimized inference pipeline
Codes/
โโโ README.md # This file
โโโ AGENTS.md # AI agent guide for codebase
โโโ requirements.txt # Python dependencies
โ
โโโ Training Pipeline
โ โโโ yolov8-train.py # YOLOv8 model training script
โ โโโ yolo_datasets/
โ โโโ yolomobileservo2.yaml # Dataset configuration
โ
โโโ Labeling Pipeline
โ โโโ video-to-frames.py # Extract MP4 โ PNG frames
โ โโโ video-labeler.py # Interactive labeling UI (Streamlit)
โ โโโ auto-labeler.py # Batch auto-labeling reference
โ โโโ dataset-cleaner.py # QA tool for labeled data
โ โโโ gray_roi.png # ROI template for pattern matching
โ
โโโ Inference & Control
โ โโโ yolov8-camera-detect.py # Real-time camera inference demo
โ โโโ auto-gamer.py # Full automation pipeline
โ โโโ serial-servo.py # Manual servo control UI
โ
โโโ Utilities
โโโ dataset-cleanup.py # Dataset maintenance
โโโ autodelete-unlabeled.py # Batch cleanup utility
- Python 3.8+
- GPU with CUDA support (recommended for training; CPU inference supported)
- OpenCV 4.6.0+
- PyTorch with torchvision
-
Clone the repository
git clone https://github.com/yourusername/YoloMobileServo.git cd YoloMobileServo/Codes -
Create virtual environment
python -m venv venv # On Windows: venv\Scripts\activate # On macOS/Linux: source venv/bin/activate
-
Install dependencies
pip install -r requirements.txt pip install ultralytics # YOLOv8 (not in requirements.txt) -
Configure paths (Windows-specific in this version)
- Paths are currently hardcoded in the scripts.
- If you want a configurable setup, update the path constants directly in the relevant files.
# Extract video frames
python video-to-frames.py
# Launch interactive labeler
streamlit run video-labeler.py
# Review and clean labels
streamlit run dataset-cleaner.pypython yolov8-train.py
# Check results in runs/detect/train_<timestamp>/# Demo: Live camera detection
python yolov8-camera-detect.py
# Automation: Detection + servo control
python auto-gamer.py
# Manual servo control
streamlit run serial-servo.py["Player", "Heart", "SmallTree", "BigTree", "Rock", "SnowPile", "EndScreen"]The "None" class (index 7) is assigned when detected objects don't exceed 85% confidence threshold.
Uses standard YOLO format with normalized center coordinates:
<class_idx> <center_x> <center_y> <width> <height>
# Example: 0 0.5 0.5 0.2 0.3 (class 0 at center, 20% width, 30% height)
Key parameters in processing files (adjust for your setup):
frame_delta_threshold- Foreground detection sensitivitycontour_min_area/contour_max_area- Object size filteringobject_class_match_min_accuracy- Classification confidence (default 0.85)thread_num- Batch processing parallelization
Note: These are tuned for 1920ร1080 resolution. Scale proportionally for different resolutions.
The system uses pattern matching against gray_roi.png to detect the playable game area, making it robust to screen position changes. If game graphics change significantly, regenerate this template.
- Training requires GPU (device=0 in yolov8-train.py)
- Inference works on GPU/CPU (specify in model.predict)
- Streamlit apps run on CPU (suitable for labeling, not real-time inference)
- Use
auto-gamer.pyfor automated gameplay - Uses baudrate 115200 for serial communication
- Servo state encoded as:
left*2 + right*1โ {0,1,2,3} - Verify COM port before running:
python -c "from serial.tools.list_ports import comports; print([cp.device for cp in comports()])"
This repository includes firmware for an ESP8266-based controller located at arduino/MobileServo.ino.
Key points about MobileServo.ino:
- The sketch sets up a small WiFi web UI for initial servo calibration (servos attached to GPIO pins used by the board).
- After calibration you press "Start" in the web UI and the board switches to serial control mode.
- Serial parameters: 115200 baud, ASCII integer values representing state (0..3). The mapping is
left*2 + right*1(same as above). - On serial input the board sets two servo positions according to the received integer:
- 0 โ left down, right down
- 1 โ left down, right up
- 2 โ left up, right down
- 3 โ left up, right up
Uploading the firmware
- Open
arduino/MobileServo.inoin the Arduino IDE. - Select the correct board (e.g. "NodeMCU 1.0 (ESP-12E Module)" or similar) and the correct COM port.
- Click Upload.
Replace COM3 with your device port and nodemcuv2 with the appropriate fqbn for your board.
Notes & Troubleshooting
- Ensure the correct USB-serial drivers are installed (e.g. CP210x, CH340, FTDI depending on your board).
- If using the WiFi calibration UI, open the serial monitor at 115200 baud to see the board IP address; then open the printed URL in your browser to access the UI.
- The firmware expects a short ASCII integer (0..3) followed by optional whitespace on serial โ the included
serial-servo.pyStreamlit app sends the same format.
After training, update model paths in inference scripts:
model_path = "runs/detect/train_XXXX_epoch_finished/weights/best.pt"ROI Detection Fails
- Check
gray_roi.pngexists in working directory - If game graphics changed, capture and regenerate the ROI template
Contours Too Noisy
- Increase
frame_delta_threshold(current: 8) - Increase
contour_min_area(current: 1000 pxยฒ)
Serial Connection Not Found
- Verify device COM port with device manager
- Ensure USB drivers installed (CH340, FTDI, etc. depending on hardware)
Out of Memory During Training
- Reduce batch size or training dataset size
- Use smaller model variant
(Expected performance on typical hardware)
- Model Training: ~90 epochs on GPU โ 4-6 hours
- Inference: 30-60 FPS at 1920ร1080 on NVIDIA GPU
- Labeling: 100 frames โ ~5-10 minutes with 4 worker threads
This project is licensed under the MIT License - see LICENSE file for details.
- AGENTS.md - Comprehensive guide for AI agents and developers
- Ultralytics YOLOv8 Docs: https://docs.ultralytics.com/
- OpenCV Documentation: https://docs.opencv.org/
- Streamlit Docs: https://docs.streamlit.io/