PipeLineForge – GitHub Trending Observability Pipeline

This repository hosts PipelineForge, a production-grade data ingestion and observability backend built using Go. The system continuously scrapes GitHub Trending repositories, processes them asynchronously using RabbitMQ, persists structured data into PostgreSQL, and exposes real-time metrics, logs, and alerts via Prometheus and Grafana.

The project demonstrates how real-world data pipelines are built, monitored, and operated in production environments.

The Project Demonstrates

Clean modular Go architecture
Decoupled producer–consumer pipeline using RabbitMQ
Asynchronous job processing with backpressure handling
PostgreSQL persistence for processed records
First-class observability with metrics, logs, and alerts
Custom Prometheus metrics instrumentation
Grafana dashboards for real-time visibility
Alerting rules for abnormal system behavior
Context-aware concurrency using goroutines
Production-focused backend and SRE practices

Architecture Diagram:-

Below is a high-level overview of the system architecture:

Core Design Principles:-

Separation of Concerns – Scraping, processing, and storage are isolated
Loose Coupling – Services communicate only via the message queue
Backpressure Safety – Queue absorbs traffic spikes
Observability First – Metrics and alerts are built-in
Fail-Safe Design – Graceful shutdown and error isolation
Scalable by Default – Components can scale independently

Technology Stack

Go
RabbitMQ
PostgreSQL
Prometheus
Grafana

System Components:-

Scraper Service (Producer)

Scrapes GitHub Trending repositories
Extracts repository metadata (name, author, stars, language, URL)
Publishes raw data to RabbitMQ
Exposes Prometheus metrics

Message Queue (RabbitMQ)

Buffers scraped repository data
Decouples ingestion from processing
Enables asynchronous and scalable workflows

Worker Service (Consumer)

Consumes messages from RabbitMQ
Validates and processes repository data
Inserts structured records into PostgreSQL
Exposes detailed processing metrics

Database (PostgreSQL)

Stores processed repository data
Optimized for write-heavy workloads
Ensures durability and consistency

Data Flow

Scraper fetches GitHub Trending repositories
Data is published to RabbitMQ
Worker consumes messages asynchronously
Processed data is stored in PostgreSQL
Metrics are scraped by Prometheus
Grafana visualizes system health

Observability

Dashboards

Metrics

Total repositories scraped
Messages published to RabbitMQ
Messages consumed by workers
Database insert count
Processing latency
Error and failure counts

Alerts

Alerts type

High database insert rate
PipelineForge worker is down
No message processed
Database errors detected
High message processing latency

Getting Started

Prerequisites

Go 1.21+
RabbitMQ
PostgreSQL
Prometheus
Grafana
Git

Clone the Repository

git clone https://github.com/yourusername/PipelineForge.git
cd PipelineForge

Run the Services

go mod tidy
go run cmd/scraper/main.go
go run cmd/worker/main.go

Ensure RabbitMQ, PostgreSQL, Prometheus, and Grafana are running locally.

Author

Hardik Borse | LinkedIn | Email

License

This project is licensed under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
cmd		cmd
documents		documents
internal		internal
migrations		migrations
monitoring		monitoring
scripts		scripts
tests/fixtures		tests/fixtures
.env		.env
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PipeLineForge – GitHub Trending Observability Pipeline

The Project Demonstrates

Table of Contents

Architecture Diagram:-

Core Design Principles:-

Technology Stack