Skip to content

feat(website): python seeder with pango lineages and test suite#1203

Merged
fhennig merged 54 commits into
mainfrom
feat/pango-lineage-seeder
May 29, 2026
Merged

feat(website): python seeder with pango lineages and test suite#1203
fhennig merged 54 commits into
mainfrom
feat/pango-lineage-seeder

Conversation

@fhennig
Copy link
Copy Markdown
Contributor

@fhennig fhennig commented May 7, 2026

Summary

  • Replaces the JS seed.mjs with a unified Python seeder in collection-seeding/ (renamed from example-data/)
  • Modular source architecture — new data sources are subclasses of the Source ABC; registering a new source is a one-liner in sources/registry.py
  • Three sources implemented:
    • covid-resistance-mutations — port of the original JS resistance mutation data (3CLpro, RdRp, Spike mAb)
    • covid-pango-lineages — fetches ~4,976 pango lineage definitions from corneliusroemer/pango-sequences, one collection per lineage with nucleotide substitutions as variants
    • covid-pango-lineages-sample — same as above but limited to 10 lineages, for quick testing (excluded from default run)
  • CLI uses --source NAME to run a single source; no flag runs all. --list prints available sources
  • Upserts collections (create or update by name) via POST /collections and PUT /collections/{id} — no diffing: every existing collection is always updated on each run
  • Deletes orphaned collections — each source declares an owned_tag (e.g. #pango-lineage) which is appended to every collection description. On each run, any existing collection with that tag that is no longer in the source output is deleted via DELETE /collections/{id}
  • Authenticates via --api-key / $API_KEY; the system user API key is hardcoded as a dummy value in Docker Compose (both backend and seeder)
  • Uses pixi for dependency management with a multi-stage Docker build (pixi builder → python:3.13-slim)
  • Typed throughout with TypedDict (Collection, Variant, FilterObject, ExistingCollection)
  • Supports repeat scheduling via --repeat-interval-hours CLI flag or $REPEAT_INTERVAL_HOURS env var — if set, the seeder loops with a sleep between runs rather than exiting; Docker Compose runs it every 8 hours with restart: always
  • 32 tests covering HTTP interactions, mutation name math, collection building, upsert orchestration, and orphan deletion

Test plan

  • pixi run -e test test — all 32 tests pass
  • pixi run seed — seeds all sources against a local backend
  • pixi run seed again — all collections updated (upsert)
  • pixi run seed-lineages — seeds all pango lineages
  • pixi run seed-lineages-sample — seeds 10 lineages quickly

@vercel
Copy link
Copy Markdown

vercel Bot commented May 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
dashboards Ready Ready Preview, Comment May 29, 2026 1:23pm

Request Review

@fhennig fhennig self-assigned this May 7, 2026
@fhennig fhennig changed the title feat(collection-seeding): Python seeder with pango lineages and test suite feat(website): Python seeder with pango lineages and test suite May 7, 2026
@fhennig fhennig changed the title feat(website): Python seeder with pango lineages and test suite feat(website): python seeder with pango lineages and test suite May 7, 2026
@fhennig fhennig force-pushed the feat/pango-lineage-seeder branch from 11984af to 2699a08 Compare May 20, 2026 08:49
@fhennig fhennig force-pushed the feat/pango-lineage-seeder branch from 2699a08 to 9bda964 Compare May 20, 2026 08:55
@fhennig fhennig changed the base branch from main to feat/api-key-auth May 20, 2026 08:56
fhennig and others added 13 commits May 28, 2026 14:24
…argparse, remove seed.mjs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eder to collection-seeding

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…) constructor in pango_lineages

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…o enforce subclass definition

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… and docs from build context

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…in api.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…end and seeder

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…llections

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 25 changed files in this pull request and generated 3 comments.

client.delete_collection.assert_not_called()


def test_no_deletion_when_owned_tag_is_none():
Comment thread docs/arc42/09-architecture-decisions.md
Comment thread collection-seeding/seed.py
Copy link
Copy Markdown
Contributor

@fengelniederhammer fengelniederhammer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment thread docs/arc42/09-architecture-decisions.md Outdated
fhennig and others added 2 commits May 29, 2026 14:22
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Fabian Engelniederhammer <92720311+fengelniederhammer@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants