This repository hosts PipelineForge, a production-grade data ingestion and observability backend built using Go. The system continuously scrapes GitHub Trending repositories, processes them asynchronously using RabbitMQ, persists structured data into PostgreSQL, and exposes real-time metrics, logs, and alerts via Prometheus and Grafana.
The project demonstrates how real-world data pipelines are built, monitored, and operated in production environments.
- Clean modular Go architecture
- Decoupled producer–consumer pipeline using RabbitMQ
- Asynchronous job processing with backpressure handling
- PostgreSQL persistence for processed records
- First-class observability with metrics, logs, and alerts
- Custom Prometheus metrics instrumentation
- Grafana dashboards for real-time visibility
- Alerting rules for abnormal system behavior
- Context-aware concurrency using goroutines
- Production-focused backend and SRE practices
- Architecture Diagram
- Core Design Principles
- Technology Stack
- System Components
- Observability
- Getting Started
- Author
- License
Below is a high-level overview of the system architecture:
- Separation of Concerns – Scraping, processing, and storage are isolated
- Loose Coupling – Services communicate only via the message queue
- Backpressure Safety – Queue absorbs traffic spikes
- Observability First – Metrics and alerts are built-in
- Fail-Safe Design – Graceful shutdown and error isolation
- Scalable by Default – Components can scale independently
- Go
- RabbitMQ
- PostgreSQL
- Prometheus
- Grafana
- Scrapes GitHub Trending repositories
- Extracts repository metadata (name, author, stars, language, URL)
- Publishes raw data to RabbitMQ
- Exposes Prometheus metrics
- Buffers scraped repository data
- Decouples ingestion from processing
- Enables asynchronous and scalable workflows
- Consumes messages from RabbitMQ
- Validates and processes repository data
- Inserts structured records into PostgreSQL
- Exposes detailed processing metrics
- Stores processed repository data
- Optimized for write-heavy workloads
- Ensures durability and consistency
- Scraper fetches GitHub Trending repositories
- Data is published to RabbitMQ
- Worker consumes messages asynchronously
- Processed data is stored in PostgreSQL
- Metrics are scraped by Prometheus
- Grafana visualizes system health
- Total repositories scraped
- Messages published to RabbitMQ
- Messages consumed by workers
- Database insert count
- Processing latency
- Error and failure counts
- High database insert rate
- PipelineForge worker is down
- No message processed
- Database errors detected
- High message processing latency
- Go 1.21+
- RabbitMQ
- PostgreSQL
- Prometheus
- Grafana
- Git
git clone https://github.com/yourusername/PipelineForge.git
cd PipelineForge
go mod tidy
go run cmd/scraper/main.go
go run cmd/worker/main.go
Ensure RabbitMQ, PostgreSQL, Prometheus, and Grafana are running locally.
Hardik Borse | LinkedIn | Email
This project is licensed under the Apache License 2.0.