Skip to content

Kallistina/distributed-SQL-query-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 

Repository files navigation

Distributed Columnar SQL Engine

A distributed SQL query engine with columnar storage and parallel query execution. This project demonstrates a coordinator-worker architecture capable of parsing, planning, distributing, executing and aggregating SQL queries across multiple nodes.


Table of Contents


Features

  • Columnar storage for efficient reading of large datasets.
  • Supports SELECT, WHERE, aggregates (SUM, COUNT, AVG, MIN, MAX), and GROUP BY.
  • Query planner for distributing tasks among workers.
  • Fault tolerance with automatic redistribution of failed workers' segments.
  • Aggregation of partial results from workers.
  • Extensible design for future SQL features.

Architecture

  +----------------+
  |   Coordinator  |
  +----------------+
          |
          | HTTP /query
          v
  +----------------+
  |   Query Parser |
  +----------------+
          |
          v
  +----------------+
  |   Planner      |
  +----------------+
          |
          v
  +----------------+        +----------------+
  | Dispatcher     |------->| Worker 1       |
  +----------------+        +----------------+
          |                 | Segments + Task
          |                 v
          |             Executes Task
          |                 |
          |                 v
          |             Returns Result
          |
          v
  +----------------+
  | Aggregator     |
  +----------------+
          |
          v
  Query Result (HTTP Response)

Setup

  1. Clone the repository:
git clone https://github.com/Kallistina/distributed-SQL-query-engine.git
cd distributed-sql-engine/minidist
  1. Initialize a new table with schema:
minidist init data/<table_name> --schema <schema_file>
  1. Load CSV data into table with sorting and segmentation:
minidist load data/<table_name> --csv <csv_file> --sort-key <column> --segments <n>
  1. Show table schema and metadata:
minidist show schema data/<table_name>
minidist show info data/<table_name>
  1. Start workers:
python3 -m worker.py --port 9001 --data_dir data/sales
python3 -m worker.py --port 9002 --data_dir data/sales
  1. Start coordinator:
python3 -m coordinator.py --workers 9001 9002
  1. Query:
curl -X POST localhost:8080/query -d "query=$QUERY"

About

big data management project

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors