Skip to content

the-onewho-knocks/PipelineForge

Repository files navigation

PipeLineForge – GitHub Trending Observability Pipeline

This repository hosts PipelineForge, a production-grade data ingestion and observability backend built using Go. The system continuously scrapes GitHub Trending repositories, processes them asynchronously using RabbitMQ, persists structured data into PostgreSQL, and exposes real-time metrics, logs, and alerts via Prometheus and Grafana.

The project demonstrates how real-world data pipelines are built, monitored, and operated in production environments.

The Project Demonstrates

  1. Clean modular Go architecture
  2. Decoupled producer–consumer pipeline using RabbitMQ
  3. Asynchronous job processing with backpressure handling
  4. PostgreSQL persistence for processed records
  5. First-class observability with metrics, logs, and alerts
  6. Custom Prometheus metrics instrumentation
  7. Grafana dashboards for real-time visibility
  8. Alerting rules for abnormal system behavior
  9. Context-aware concurrency using goroutines
  10. Production-focused backend and SRE practices

Table of Contents

Architecture Diagram:-

Below is a high-level overview of the system architecture:

pipelineforge arch

Core Design Principles:-

  1. Separation of Concerns – Scraping, processing, and storage are isolated
  2. Loose Coupling – Services communicate only via the message queue
  3. Backpressure Safety – Queue absorbs traffic spikes
  4. Observability First – Metrics and alerts are built-in
  5. Fail-Safe Design – Graceful shutdown and error isolation
  6. Scalable by Default – Components can scale independently

Technology Stack

  1. Go
  2. RabbitMQ
  3. PostgreSQL
  4. Prometheus
  5. Grafana

System Components:-

Scraper Service (Producer)

  1. Scrapes GitHub Trending repositories
  2. Extracts repository metadata (name, author, stars, language, URL)
  3. Publishes raw data to RabbitMQ
  4. Exposes Prometheus metrics

Message Queue (RabbitMQ)

  1. Buffers scraped repository data
  2. Decouples ingestion from processing
  3. Enables asynchronous and scalable workflows

Worker Service (Consumer)

  1. Consumes messages from RabbitMQ
  2. Validates and processes repository data
  3. Inserts structured records into PostgreSQL
  4. Exposes detailed processing metrics

Database (PostgreSQL)

  1. Stores processed repository data
  2. Optimized for write-heavy workloads
  3. Ensures durability and consistency

Data Flow

  1. Scraper fetches GitHub Trending repositories
  2. Data is published to RabbitMQ
  3. Worker consumes messages asynchronously
  4. Processed data is stored in PostgreSQL
  5. Metrics are scraped by Prometheus
  6. Grafana visualizes system health

Observability

Dashboards

Screenshot 2026-01-13 170046

Screenshot 2026-01-13 170108

Metrics

  1. Total repositories scraped
  2. Messages published to RabbitMQ
  3. Messages consumed by workers
  4. Database insert count
  5. Processing latency
  6. Error and failure counts

Alerts

Screenshot 2026-01-12 221054

Alerts type

  1. High database insert rate
  2. PipelineForge worker is down
  3. No message processed
  4. Database errors detected
  5. High message processing latency

Getting Started

Prerequisites

  1. Go 1.21+
  2. RabbitMQ
  3. PostgreSQL
  4. Prometheus
  5. Grafana
  6. Git

Clone the Repository

git clone https://github.com/yourusername/PipelineForge.git
cd PipelineForge

Run the Services

go mod tidy
go run cmd/scraper/main.go
go run cmd/worker/main.go

Ensure RabbitMQ, PostgreSQL, Prometheus, and Grafana are running locally.

Author

Hardik Borse | LinkedIn | Email

License

This project is licensed under the Apache License 2.0.

About

PipelineForge is a production-grade backend pipeline that scrapes data, queues tasks via RabbitMQ, processes them asynchronously with workers, and stores structured results in PostgreSQL. Prometheus and Grafana provide full observability, enabling monitoring of system health, performance, and throughput in a realistic, scalable architecture.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages