Skip to content

Rishabh1925/OceanEYE-TaxaFormer

Repository files navigation

TaxaFormer — Decode the Ocean with AI

TaxaFormer

AI-Powered eDNA Classification Platform for Marine Biodiversity

Transform environmental DNA sequences into biodiversity insights using the Nucleotide Transformer

Live Demo Next.js FastAPI PyTorch MIT License

FeaturesDemoArchitectureGetting StartedTech StackContributing


About

TaxaFormer is a web-based platform built for the OceanEYE initiative that leverages the Nucleotide Transformer — a state-of-the-art genomic foundation model — to classify environmental DNA (eDNA) sequences sampled from marine ecosystems. The platform provides taxonomic classification from phylum to genus level, novelty detection for potentially undiscovered species, and rich interactive visualizations for biodiversity analysis.

Built as part of Smart India Hackathon (SIH), TaxaFormer bridges the gap between raw eDNA sequencing data and actionable marine biodiversity insights.

Platform Highlights

Metric Value
Sequences Processed 1.2M+
Classification Accuracy 99.8%
Sampling Locations 47 across 23 countries
Species Identified 1,284 across 23 phyla
Samples Analyzed 661 from 18 research projects
Reference Database PR2 + SILVA

Key Features

Genomic Transformer Core

Fine-tuned Nucleotide Transformer model for context-aware DNA syntax analysis. Classifies eDNA sequences across the full taxonomic hierarchy — from Kingdom to Genus — with high confidence scoring.

High-Speed Inference

Optimized inference pipeline using smart compression and efficient sorting. Process massive DNA datasets containing thousands of sequences in seconds.

Novelty Detection

Automated embedding distance metrics flag unknown variants and potentially undiscovered species. Sequences exceeding the novelty threshold are tagged as POTENTIALLY NOVEL for further investigation.

Interactive Global Map

Leaflet-powered interactive map with satellite and ocean tile layers. Visualize and compare eDNA findings against global areas of high ecological importance, with custom markers for each sampling location and depth-based color coding.

TaxaFormer Features

Rich Analytics Dashboard

10+ interactive chart types for deep biodiversity analysis:

Chart Type Purpose
Taxonomy Pie Chart Interactive species composition breakdown
Taxonomy Sankey Hierarchical taxonomy flow visualization
Taxonomy Sunburst Multi-level radial taxonomy drill-down
Taxonomy Rainbow Color-coded taxonomic distribution
Taxa Abundance Relative abundance across groups
Novelty Histogram Distribution of novelty scores
Area Gradient Chart Temporal trends in sequence data
Radar Chart Multi-dimensional quality metrics
Bar Chart Comparative taxonomy counts
Taxonomy Composition Stacked compositional analysis

PDF Report Generation

One-click downloadable PDF reports with complete analysis summaries, taxonomy tables, charts, and metadata — ready for publication or academic submission.

Drag-and-Drop Upload

Intuitive file upload with drag-and-drop support for .fasta, .fa, and .fna formats. Includes sample metadata input (GPS coordinates, environmental parameters like temperature, salinity, pH, dissolved oxygen) and real-time queue management.

TaxaFormer Upload Interface


Live Demo

taxaformer-sih-oceaneye.vercel.app

Try it out — upload a FASTA file or explore existing sample analyses to see TaxaFormer in action.


Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        FRONTEND (Vercel)                        │
│   Next.js 16 • React 19 • Tailwind CSS • Recharts • Leaflet     │
│                                                                 │
│  ┌──────────┐ ┌───────────┐ ┌──────────┐ ┌──────────────────┐   │
│  │  Upload  │ │  Results  │ │   Map    │ │    Analytics     │   │
│  │  Page    │ │  & Output │ │   View   | │    Dashboard     │   │
│  └────┬─────┘ └─────┬─────┘ └────┬─────┘ └─────────┬────────┘   │
│       │             │            │                 │            │
│       └─────────────┴────────────┴─────────────────┘            │
│                              │                                  │
└──────────────────────────────┼──────────────────────────────────┘
                               │  REST API
┌──────────────────────────────┼──────────────────────────────────┐
│                        BACKEND (FastAPI)                        │
│                              │                                  │
│  ┌──────────────────┐  ┌─────────────┐  ┌───────────────────┐   │
│  │  Queue System    │  │  ML Pipeline│  │   Analytics API   │   │
│  │  (Job Management)│  │  (Classify) │  │     (Metrics)     │   │
│  └──────────────────┘  └──────┬──────┘  └───────────────────┘   │
│                               │                                 │
│                    ┌──────────┴──────────┐                      │
│                    │    Nucleotide       │                      │
│                    │    Transformer      │                      │
│                    │    (Fine-tuned)     │                      │
│                    └─────────────────────┘                      │
└──────────────────────────────┬──────────────────────────────────┘
                               │
┌──────────────────────────────┼──────────────────────────────────┐
│                     DATABASE (Supabase)                         │
│                                                                 │
│  ┌────────────────┐  ┌────────────────┐  ┌──────────────────┐   │
│  │  Analysis Jobs │  │  Cached Results│  │  Analytics Data  │   │
│  └────────────────┘  └────────────────┘  └──────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

Project Structure

OceanEYE-TaxaFormer/
├── src/                          # Frontend (Next.js 16)
│   ├── app/                      # Next.js App Router
│   │   ├── page.tsx              # Main SPA application
│   │   ├── layout.tsx            # Root layout with metadata
│   │   └── globals.css           # Global styles & design tokens
│   ├── components/               # React components
│   │   ├── HomePage.tsx          # Landing page with hero section
│   │   ├── UploadPage.tsx        # Drag-and-drop file upload
│   │   ├── OutputPage.tsx        # Taxonomy results & tables
│   │   ├── ResultsPage.tsx       # Analysis summary view
│   │   ├── MapPage.tsx           # Interactive Leaflet map
│   │   ├── ReportPage.tsx        # PDF report generator
│   │   ├── AnalyticsDashboard.tsx# Analytics overview
│   │   ├── ContactPage.tsx       # Contact & support
│   │   ├── FAQPage.tsx           # Frequently asked questions
│   │   ├── ModernNav.tsx         # Navigation bar
│   │   ├── QueueStatus.tsx       # Real-time job queue status
│   │   ├── charts/               # 10 chart components
│   │   │   ├── ChartPieInteractive.tsx
│   │   │   ├── ChartTaxonomySankey.tsx
│   │   │   ├── ChartTaxonomySunburst.tsx
│   │   │   ├── ChartTaxonomyRainbow.tsx
│   │   │   ├── ChartTaxaAbundance.tsx
│   │   │   ├── ChartNoveltyHistogram.tsx
│   │   │   ├── ChartAreaGradient.tsx
│   │   │   ├── ChartRadarDots.tsx
│   │   │   ├── ChartBarDefault.tsx
│   │   │   └── ChartTaxonomyComposition.tsx
│   │   └── ui/                  # shadcn/ui primitives (56 components)
│   └── utils/                   # Utility functions
├── backend/                      # Python FastAPI backend
│   ├── main.py                  # Core API server
│   ├── pipeline.py              # ML classification pipeline
│   ├── queue_system.py          # Async job queue management
│   ├── analytics_api.py         # Analytics endpoints
│   ├── main_cached.py           # Cached inference server
│   ├── main_with_db.py          # DB-integrated server
│   └── requirements.txt         # Python dependencies
├── db/                           # Database layer
│   ├── supabase_db.py           # Supabase client & queries
│   ├── supabase_schema.sql      # Core database schema
│   ├── analytics_schema.sql     # Analytics tables
│   └── migration__add_analysis_jobs.sql
├── notebooks/                    # Model training
│   └── taxaformer_model.ipynb   # Nucleotide Transformer fine-tuning
├── scripts/                      # Utility scripts
│   ├── kaggle_backend_complete.py  # Kaggle GPU deployment
│   ├── setup_database.py        # DB initialization
│   └── test_*.py                # Integration tests
├── results/                      # Sample analysis outputs
└── public/                       # Static assets & icons

Getting Started

Prerequisites

  • Node.js ≥ 20 (see .nvmrc)
  • Python ≥ 3.10
  • Supabase account (for database)

1. Clone the Repository

git clone https://github.com/Rishabh1925/OceanEYE-TaxaFormer.git
cd OceanEYE-TaxaFormer

2. Frontend Setup

# Install dependencies
npm install --legacy-peer-deps

# Start development server
npm run dev

The frontend will be available at http://localhost:3000.

3. Backend Setup

cd backend

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # macOS/Linux
# venv\Scripts\activate   # Windows

# Install dependencies
pip install -r requirements.txt

# Start API server
python main.py

The API server will start at http://localhost:8000.

4. Environment Variables

Create a .env file in the project root:

# Supabase Configuration
SUPABASE_URL=your_supabase_project_url
SUPABASE_KEY=your_supabase_anon_key

# Optional: Ngrok tunneling (for Kaggle-hosted backend)
NGROK_TOKEN=your_ngrok_auth_token

5. Database Setup

# Initialize Supabase tables
python scripts/setup_database.py

Tech Stack

Layer Technology Purpose
Frontend Next.js 16 React framework with App Router
React 19 UI library with React Compiler
Tailwind CSS 4 Utility-first styling
Recharts Composable chart library
Leaflet Interactive mapping
shadcn/ui + Radix Accessible UI primitives
Backend FastAPI Async Python API framework
PyTorch + Transformers Deep learning inference
NumPy Numerical computation
Database Supabase (PostgreSQL) Managed database with caching
ML Model Nucleotide Transformer Genomic foundation model (fine-tuned)
Deployment Vercel Edge-optimized hosting
Animations GSAP + Three.js Smooth transitions & 3D effects

How It Works

graph LR
    A[Upload FASTA File] --> B[FastAPI Backend]
    B --> C[Parse Sequences]
    C --> D[Nucleotide Transformer]
    D --> E[Taxonomy Classification]
    E --> F{Novelty Check}
    F -->|Known Species| G[Classification Result]
    F -->|Novel Variant| H[Flag as Potentially Novel]
    G --> I[Interactive Dashboard]
    H --> I
    I --> J[Global Map]
    I --> K[Analytics Charts]
    I --> L[PDF Report]
Loading
  1. Upload — Drag and drop .fasta, .fa, or .fna files with optional sample metadata (GPS, environmental parameters)
  2. Process — The backend parses FASTA sequences and feeds them through the fine-tuned Nucleotide Transformer
  3. Classify — Each sequence receives a taxonomic classification (Phylum → Genus) with a confidence score
  4. Detect — Sequences with high novelty scores are flagged as potentially novel species
  5. Visualize — Results are presented through interactive charts, maps, and downloadable PDF reports

Deployment

Vercel (Frontend)

The frontend is deployed on Vercel and automatically builds from the main branch.

# Production build
npm run build

# Preview locally
npm run start

Backend (Kaggle / Cloud)

The backend can be deployed on Kaggle (for free GPU access) or any cloud provider:

# Kaggle deployment (with ngrok tunneling)
python scripts/kaggle_backend_complete.py

# OR run locally
cd backend && python main.py

Contributing

Contributions are welcome! Here's how to get started:

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.


Built by Team OceanEYE for Smart India Hackathon


About

AI-powered eDNA classification platform using Nucleotide Transformer for marine biodiversity analysis - taxonomic classification, novelty detection, interactive mapping & rich analytics

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors