AI-Powered eDNA Classification Platform for Marine Biodiversity
Transform environmental DNA sequences into biodiversity insights using the Nucleotide Transformer
Features • Demo • Architecture • Getting Started • Tech Stack • Contributing
TaxaFormer is a web-based platform built for the OceanEYE initiative that leverages the Nucleotide Transformer — a state-of-the-art genomic foundation model — to classify environmental DNA (eDNA) sequences sampled from marine ecosystems. The platform provides taxonomic classification from phylum to genus level, novelty detection for potentially undiscovered species, and rich interactive visualizations for biodiversity analysis.
Built as part of Smart India Hackathon (SIH), TaxaFormer bridges the gap between raw eDNA sequencing data and actionable marine biodiversity insights.
| Metric | Value |
|---|---|
| Sequences Processed | 1.2M+ |
| Classification Accuracy | 99.8% |
| Sampling Locations | 47 across 23 countries |
| Species Identified | 1,284 across 23 phyla |
| Samples Analyzed | 661 from 18 research projects |
| Reference Database | PR2 + SILVA |
Fine-tuned Nucleotide Transformer model for context-aware DNA syntax analysis. Classifies eDNA sequences across the full taxonomic hierarchy — from Kingdom to Genus — with high confidence scoring.
Optimized inference pipeline using smart compression and efficient sorting. Process massive DNA datasets containing thousands of sequences in seconds.
Automated embedding distance metrics flag unknown variants and potentially undiscovered species. Sequences exceeding the novelty threshold are tagged as POTENTIALLY NOVEL for further investigation.
Leaflet-powered interactive map with satellite and ocean tile layers. Visualize and compare eDNA findings against global areas of high ecological importance, with custom markers for each sampling location and depth-based color coding.
10+ interactive chart types for deep biodiversity analysis:
| Chart Type | Purpose |
|---|---|
| Taxonomy Pie Chart | Interactive species composition breakdown |
| Taxonomy Sankey | Hierarchical taxonomy flow visualization |
| Taxonomy Sunburst | Multi-level radial taxonomy drill-down |
| Taxonomy Rainbow | Color-coded taxonomic distribution |
| Taxa Abundance | Relative abundance across groups |
| Novelty Histogram | Distribution of novelty scores |
| Area Gradient Chart | Temporal trends in sequence data |
| Radar Chart | Multi-dimensional quality metrics |
| Bar Chart | Comparative taxonomy counts |
| Taxonomy Composition | Stacked compositional analysis |
One-click downloadable PDF reports with complete analysis summaries, taxonomy tables, charts, and metadata — ready for publication or academic submission.
Intuitive file upload with drag-and-drop support for .fasta, .fa, and .fna formats. Includes sample metadata input (GPS coordinates, environmental parameters like temperature, salinity, pH, dissolved oxygen) and real-time queue management.
taxaformer-sih-oceaneye.vercel.app
Try it out — upload a FASTA file or explore existing sample analyses to see TaxaFormer in action.
┌─────────────────────────────────────────────────────────────────┐
│ FRONTEND (Vercel) │
│ Next.js 16 • React 19 • Tailwind CSS • Recharts • Leaflet │
│ │
│ ┌──────────┐ ┌───────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ Upload │ │ Results │ │ Map │ │ Analytics │ │
│ │ Page │ │ & Output │ │ View | │ Dashboard │ │
│ └────┬─────┘ └─────┬─────┘ └────┬─────┘ └─────────┬────────┘ │
│ │ │ │ │ │
│ └─────────────┴────────────┴─────────────────┘ │
│ │ │
└──────────────────────────────┼──────────────────────────────────┘
│ REST API
┌──────────────────────────────┼──────────────────────────────────┐
│ BACKEND (FastAPI) │
│ │ │
│ ┌──────────────────┐ ┌─────────────┐ ┌───────────────────┐ │
│ │ Queue System │ │ ML Pipeline│ │ Analytics API │ │
│ │ (Job Management)│ │ (Classify) │ │ (Metrics) │ │
│ └──────────────────┘ └──────┬──────┘ └───────────────────┘ │
│ │ │
│ ┌──────────┴──────────┐ │
│ │ Nucleotide │ │
│ │ Transformer │ │
│ │ (Fine-tuned) │ │
│ └─────────────────────┘ │
└──────────────────────────────┬──────────────────────────────────┘
│
┌──────────────────────────────┼──────────────────────────────────┐
│ DATABASE (Supabase) │
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌──────────────────┐ │
│ │ Analysis Jobs │ │ Cached Results│ │ Analytics Data │ │
│ └────────────────┘ └────────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
OceanEYE-TaxaFormer/
├── src/ # Frontend (Next.js 16)
│ ├── app/ # Next.js App Router
│ │ ├── page.tsx # Main SPA application
│ │ ├── layout.tsx # Root layout with metadata
│ │ └── globals.css # Global styles & design tokens
│ ├── components/ # React components
│ │ ├── HomePage.tsx # Landing page with hero section
│ │ ├── UploadPage.tsx # Drag-and-drop file upload
│ │ ├── OutputPage.tsx # Taxonomy results & tables
│ │ ├── ResultsPage.tsx # Analysis summary view
│ │ ├── MapPage.tsx # Interactive Leaflet map
│ │ ├── ReportPage.tsx # PDF report generator
│ │ ├── AnalyticsDashboard.tsx# Analytics overview
│ │ ├── ContactPage.tsx # Contact & support
│ │ ├── FAQPage.tsx # Frequently asked questions
│ │ ├── ModernNav.tsx # Navigation bar
│ │ ├── QueueStatus.tsx # Real-time job queue status
│ │ ├── charts/ # 10 chart components
│ │ │ ├── ChartPieInteractive.tsx
│ │ │ ├── ChartTaxonomySankey.tsx
│ │ │ ├── ChartTaxonomySunburst.tsx
│ │ │ ├── ChartTaxonomyRainbow.tsx
│ │ │ ├── ChartTaxaAbundance.tsx
│ │ │ ├── ChartNoveltyHistogram.tsx
│ │ │ ├── ChartAreaGradient.tsx
│ │ │ ├── ChartRadarDots.tsx
│ │ │ ├── ChartBarDefault.tsx
│ │ │ └── ChartTaxonomyComposition.tsx
│ │ └── ui/ # shadcn/ui primitives (56 components)
│ └── utils/ # Utility functions
├── backend/ # Python FastAPI backend
│ ├── main.py # Core API server
│ ├── pipeline.py # ML classification pipeline
│ ├── queue_system.py # Async job queue management
│ ├── analytics_api.py # Analytics endpoints
│ ├── main_cached.py # Cached inference server
│ ├── main_with_db.py # DB-integrated server
│ └── requirements.txt # Python dependencies
├── db/ # Database layer
│ ├── supabase_db.py # Supabase client & queries
│ ├── supabase_schema.sql # Core database schema
│ ├── analytics_schema.sql # Analytics tables
│ └── migration__add_analysis_jobs.sql
├── notebooks/ # Model training
│ └── taxaformer_model.ipynb # Nucleotide Transformer fine-tuning
├── scripts/ # Utility scripts
│ ├── kaggle_backend_complete.py # Kaggle GPU deployment
│ ├── setup_database.py # DB initialization
│ └── test_*.py # Integration tests
├── results/ # Sample analysis outputs
└── public/ # Static assets & icons
- Node.js ≥ 20 (see
.nvmrc) - Python ≥ 3.10
- Supabase account (for database)
git clone https://github.com/Rishabh1925/OceanEYE-TaxaFormer.git
cd OceanEYE-TaxaFormer# Install dependencies
npm install --legacy-peer-deps
# Start development server
npm run devThe frontend will be available at http://localhost:3000.
cd backend
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # macOS/Linux
# venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Start API server
python main.pyThe API server will start at http://localhost:8000.
Create a .env file in the project root:
# Supabase Configuration
SUPABASE_URL=your_supabase_project_url
SUPABASE_KEY=your_supabase_anon_key
# Optional: Ngrok tunneling (for Kaggle-hosted backend)
NGROK_TOKEN=your_ngrok_auth_token# Initialize Supabase tables
python scripts/setup_database.py| Layer | Technology | Purpose |
|---|---|---|
| Frontend | Next.js 16 | React framework with App Router |
| React 19 | UI library with React Compiler | |
| Tailwind CSS 4 | Utility-first styling | |
| Recharts | Composable chart library | |
| Leaflet | Interactive mapping | |
| shadcn/ui + Radix | Accessible UI primitives | |
| Backend | FastAPI | Async Python API framework |
| PyTorch + Transformers | Deep learning inference | |
| NumPy | Numerical computation | |
| Database | Supabase (PostgreSQL) | Managed database with caching |
| ML Model | Nucleotide Transformer | Genomic foundation model (fine-tuned) |
| Deployment | Vercel | Edge-optimized hosting |
| Animations | GSAP + Three.js | Smooth transitions & 3D effects |
graph LR
A[Upload FASTA File] --> B[FastAPI Backend]
B --> C[Parse Sequences]
C --> D[Nucleotide Transformer]
D --> E[Taxonomy Classification]
E --> F{Novelty Check}
F -->|Known Species| G[Classification Result]
F -->|Novel Variant| H[Flag as Potentially Novel]
G --> I[Interactive Dashboard]
H --> I
I --> J[Global Map]
I --> K[Analytics Charts]
I --> L[PDF Report]
- Upload — Drag and drop
.fasta,.fa, or.fnafiles with optional sample metadata (GPS, environmental parameters) - Process — The backend parses FASTA sequences and feeds them through the fine-tuned Nucleotide Transformer
- Classify — Each sequence receives a taxonomic classification (Phylum → Genus) with a confidence score
- Detect — Sequences with high novelty scores are flagged as potentially novel species
- Visualize — Results are presented through interactive charts, maps, and downloadable PDF reports
The frontend is deployed on Vercel and automatically builds from the main branch.
# Production build
npm run build
# Preview locally
npm run startThe backend can be deployed on Kaggle (for free GPU access) or any cloud provider:
# Kaggle deployment (with ngrok tunneling)
python scripts/kaggle_backend_complete.py
# OR run locally
cd backend && python main.pyContributions are welcome! Here's how to get started:
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Distributed under the MIT License. See LICENSE for more information.
Built by Team OceanEYE for Smart India Hackathon


