Skip to content

DhanushN2005/crypto-price-streaming-kafka-snowflake-ml-pipeline

Repository files navigation

🚀 Real-Time Crypto Streaming & ML Pipeline

An end-to-end real-time data engineering and machine learning project that streams live crypto price data from an external API, processes it using Kafka, stores it in Snowflake, trains ML models for prediction and anomaly detection, and visualizes results using a dashboard.


🧠 Project Overview

This project demonstrates how modern data systems handle real-time streaming data, perform analytics and ML model training, and serve insights through a simple application.

🔄 Data Flow

Finnhub API → Kafka (Docker) → Snowflake → ML Models → Dashboard


🛠️ Tech Stack

  • API: Finnhub (Live Crypto Prices)
  • Streaming: Apache Kafka (Dockerized)
  • Data Warehouse: Snowflake
  • Machine Learning:
    • Regression (Price Prediction)
    • Isolation Forest (Anomaly Detection)
  • Visualization: Streamlit
  • Language: Python

Sample output; image

⚙️ Setup & Installation

1️⃣ Prerequisites

  • Python 3.9+
  • Docker Desktop
  • Snowflake account
  • Finnhub API key

2️⃣ Install Python Dependencies

pip install -r requirements.txt

3️⃣ Start Kafka (Docker)

docker-compose up -d

Verify:

docker ps

4️⃣ Create Snowflake Table Run in Snowflake worksheet:

CREATE DATABASE IF NOT EXISTS CRYPTO_DB;
USE DATABASE CRYPTO_DB;
CREATE SCHEMA IF NOT EXISTS PUBLIC;

CREATE TABLE IF NOT EXISTS CRYPTO_PRICES (
    SYMBOL STRING,
    PRICE FLOAT,
    EVENT_TIME TIMESTAMP
);

▶️ Running the Pipeline 🔹 Start Producer (Finnhub → Kafka) python finnhub_producer.py 🔹 Start Consumer (Kafka → Snowflake) python kafka_to_snowflake.py Data will start streaming into Snowflake in real time.

🤖 Machine Learning Run the ML pipeline:

python ml_pipeline.py

This performs:

Feature engineering

-Price prediction (regression)

-Anomaly detection

-Saves outputs to CSV files

📊 Dashboard Launch the Streamlit app:

streamlit run app.py

Dashboard Features -Actual vs Predicted price trend

-Highlighted anomaly points

-Symbol-based filtering

[Note: You need config.py tht contains the requeried API's and snowflake credentials.]

Author: Dhanush N

About

A real world crypto currencies streaming and analysing using kafka through snowflake.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages