Skip to content

xdmanflow/Atlas-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 

Repository files navigation

Atlas Project

An AI-powered personal study and productivity backend — built to learn, built to grow.

Python FastAPI scikit-learn Pandas SQLite SQLAlchemy


What this is

Atlas Project is a REST API that helps me track, analyse, and predict my own study habits. It started as a Pomodoro timer. It became a machine learning project. The system logs every study session to a SQLite database, exposes a FastAPI backend for CRUD operations and analytics, and serves a trained classification model that predicts — given a subject, duration, and time of day — whether a session will be completed or interrupted. I built this to learn Python engineering properly: no vibe-coding, no AI-generated functions. Every line is mine.


Live features

Endpoint Method Description
/sessions GET List all study sessions
/sessions POST Log a new session
/sessions/{id} GET Retrieve one session
/sessions/{id}/complete PATCH Mark a session as completed
/sessions/{id} DELETE Remove a session
/predict POST ML prediction: will this session be completed?
/analytics GET Aggregated stats: time by subject, completion rate, trends
/docs GET Auto-generated Swagger UI (FastAPI built-in)

Stack

Layer Technology Why
Language Python 3.11 Primary language for AI/DS engineering roles
API framework FastAPI + Uvicorn Async, typed, auto-documentation, industry standard
Database SQLite + SQLAlchemy Simple persistence with a real ORM
Data analysis Pandas + Matplotlib Aggregation and visualisation of session data
Machine Learning scikit-learn Classification model (RandomForestClassifier)
Model serialisation joblib Save and reload trained model between restarts
Validation Pydantic v2 Request/response schemas, type safety at the boundary

Project structure

atlas-dev-os/
├── backend/
│   ├── api/
│   │   ├── routes.py          # all FastAPI endpoints
│   │   └── schemas.py         # Pydantic request/response models
│   ├── core/
│   │   ├── database.py        # SQLAlchemy engine, session, Base
│   │   ├── models.py          # ORM table definitions
│   │   ├── crud.py            # create / read / update / delete
│   │   └── ml.py              # model loading, feature engineering, predict()
│   └── main.py                # app factory, router registration
├── ml/
│   ├── train.py               # training script — run once to produce model
│   ├── evaluate.py            # accuracy, classification report, feature importance
│   └── completion_model.pkl   # serialised RandomForestClassifier
├── notebooks/
│   ├── 01_eda.ipynb           # exploratory data analysis on session CSV
│   └── 02_model_selection.ipynb  # comparing Logistic Regression vs Random Forest
├── scripts/
│   └── study_timer.py         # CLI Pomodoro timer — the data source
├── tests/
│   ├── test_crud.py
│   ├── test_routes.py
│   └── test_ml.py
├── .env.example               # copy to .env and fill in your keys
├── requirements.txt
└── README.md

Getting started

Requirements: Python 3.11+, Git

# 1. Clone
git clone https://github.com/xdmanflow/atlas-project.git
cd atlas-project

# 2. Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Set up environment variables
cp .env.example .env
# Edit .env and add your keys if needed

# 5. Run the API
uvicorn backend.main:app --reload

# 6. Open the interactive docs
# http://localhost:8000/docs

Train the ML model (required before using /predict):

# First generate some sessions with the timer
python scripts/study_timer.py --sessions 4

# Then train the model on your data
python ml/train.py
# Outputs: ml/completion_model.pkl

The ML model

Problem: binary classification — will a study session be completed?

Features used:

  • duration_minutes — longer sessions correlate with lower completion
  • start_hour — time of day affects focus (encoded from start_time)
  • subject_encoded — some subjects are harder to stay focused on (LabelEncoder)

Training:

Dataset size   : 200+ logged sessions
Train/test split: 80% / 20%  (random_state=42)
Best model     : RandomForestClassifier(n_estimators=100)
Test accuracy  : ~78%  (improves with more logged data)

Example prediction request:

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"subject": "FastAPI", "duration_minutes": 25, "start_hour": 9}'
{
  "will_complete": true,
  "confidence": 0.84
}

Example analytics response

{
  "total_sessions": 47,
  "completed_sessions": 38,
  "completion_rate": 0.81,
  "total_minutes": 1025,
  "avg_duration_minutes": 26.4,
  "top_subjects": [
    { "subject": "Python", "total_minutes": 325 },
    { "subject": "FastAPI", "total_minutes": 200 },
    { "subject": "ML basics", "total_minutes": 175 }
  ]
}

What I learned building this

This project was a deliberate practice run across the full Python engineering stack.

Starting from a simple CLI script (study_timer.py), I progressively added:

  1. OOP layer — refactored the timer into StudySession, DeepWorkSession, and BreakSession classes with proper inheritance, __str__/__repr__, and to_csv_row() serialisation.
  2. HTTP layer — built a requests-based API client before writing my own server, so I understood what "an endpoint" actually is from the client side first.
  3. Functional layer — rewrote data processing pipelines using map, filter, functools.reduce, and itertools.groupby — understanding lazy evaluation in the process.
  4. API layer — built a FastAPI backend with Pydantic v2 validation, proper HTTP status codes, and path/query parameter handling.
  5. Persistence layer — replaced in-memory storage with SQLAlchemy + SQLite, implementing full CRUD with proper session management.
  6. Analytics layer — loaded the database into Pandas DataFrames to compute aggregates and generate charts.
  7. ML layer — engineered features from raw session data, compared two classifiers, evaluated with classification_report, serialised the best model with joblib.

The hardest part was Day 7: understanding why SessionLocal is a factory, not a session, and why you always call .close() in a finally block or use a context manager.


Roadmap

  • Add JWT authentication (FastAPI + OAuth2 + password hashing)
  • Docker + docker-compose for one-command local setup
  • GitHub Actions CI pipeline (pytest on every push)
  • Deploy to Railway (free tier)
  • React frontend — study dashboard with real-time analytics charts
  • Improve ML model — add day_of_week and previous_session_completed as features

Author

Manil Doudou — Computer Engineering student, CESI Engineering School, Toulouse, France

Specialising in AI and Data Science · Looking for a 3–4 month internship from September 2026

This is Atlas Project. Mind the gap between the train and the platform. The next station is Destiny

About

AI-powered study tracker with FastAPI backend, SQLite persistence, and ML session-completion prediction. Built from scratch to learn computer science engineering in AI & DS.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors