Distributed Columnar SQL Engine

A distributed SQL query engine with columnar storage and parallel query execution. This project demonstrates a coordinator-worker architecture capable of parsing, planning, distributing, executing and aggregating SQL queries across multiple nodes.

Features

Columnar storage for efficient reading of large datasets.
Supports SELECT, WHERE, aggregates (SUM, COUNT, AVG, MIN, MAX), and GROUP BY.
Query planner for distributing tasks among workers.
Fault tolerance with automatic redistribution of failed workers' segments.
Aggregation of partial results from workers.
Extensible design for future SQL features.

Architecture

  +----------------+
  |   Coordinator  |
  +----------------+
          |
          | HTTP /query
          v
  +----------------+
  |   Query Parser |
  +----------------+
          |
          v
  +----------------+
  |   Planner      |
  +----------------+
          |
          v
  +----------------+        +----------------+
  | Dispatcher     |------->| Worker 1       |
  +----------------+        +----------------+
          |                 | Segments + Task
          |                 v
          |             Executes Task
          |                 |
          |                 v
          |             Returns Result
          |
          v
  +----------------+
  | Aggregator     |
  +----------------+
          |
          v
  Query Result (HTTP Response)

Setup

Clone the repository:

git clone https://github.com/Kallistina/distributed-SQL-query-engine.git
cd distributed-sql-engine/minidist

Initialize a new table with schema:

minidist init data/<table_name> --schema <schema_file>

Load CSV data into table with sorting and segmentation:

minidist load data/<table_name> --csv <csv_file> --sort-key <column> --segments <n>

Show table schema and metadata:

minidist show schema data/<table_name>
minidist show info data/<table_name>

Start workers:

python3 -m worker.py --port 9001 --data_dir data/sales
python3 -m worker.py --port 9002 --data_dir data/sales

Start coordinator:

python3 -m coordinator.py --workers 9001 9002

Query:

curl -X POST localhost:8080/query -d "query=$QUERY"

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
minidist		minidist
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Columnar SQL Engine

Table of Contents

Features

Architecture

Setup

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Distributed Columnar SQL Engine

Table of Contents

Features

Architecture

Setup

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages