An end-to-end real-time data engineering and machine learning project that streams live crypto price data from an external API, processes it using Kafka, stores it in Snowflake, trains ML models for prediction and anomaly detection, and visualizes results using a dashboard.
This project demonstrates how modern data systems handle real-time streaming data, perform analytics and ML model training, and serve insights through a simple application.
Finnhub API → Kafka (Docker) → Snowflake → ML Models → Dashboard
- API: Finnhub (Live Crypto Prices)
- Streaming: Apache Kafka (Dockerized)
- Data Warehouse: Snowflake
- Machine Learning:
- Regression (Price Prediction)
- Isolation Forest (Anomaly Detection)
- Visualization: Streamlit
- Language: Python
- Python 3.9+
- Docker Desktop
- Snowflake account
- Finnhub API key
pip install -r requirements.txt3️⃣ Start Kafka (Docker)
docker-compose up -dVerify:
docker ps4️⃣ Create Snowflake Table Run in Snowflake worksheet:
CREATE DATABASE IF NOT EXISTS CRYPTO_DB;
USE DATABASE CRYPTO_DB;
CREATE SCHEMA IF NOT EXISTS PUBLIC;
CREATE TABLE IF NOT EXISTS CRYPTO_PRICES (
SYMBOL STRING,
PRICE FLOAT,
EVENT_TIME TIMESTAMP
);🤖 Machine Learning Run the ML pipeline:
python ml_pipeline.pyThis performs:
Feature engineering
-Price prediction (regression)
-Anomaly detection
-Saves outputs to CSV files
📊 Dashboard Launch the Streamlit app:
streamlit run app.pyDashboard Features -Actual vs Predicted price trend
-Highlighted anomaly points
-Symbol-based filtering
[Note: You need config.py tht contains the requeried API's and snowflake credentials.]
Author: Dhanush N
