Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
45cd3ef
feat(example-data): add Python pango lineage collection seeder
fhennig May 7, 2026
ee8ff3a
feat(example-data): add requirements.txt and Dockerfile for lineages …
fhennig May 7, 2026
b7f327e
refactor(example-data): unified Python seeder with source modules
fhennig May 7, 2026
2ffef39
feat(example-data): replace requirements.txt with pixi
fhennig May 7, 2026
4121ad8
feat(example-data): upsert seed user via POST /users/sync before seeding
fhennig May 7, 2026
965bad3
chore: add Python and pixi entries to .gitignore
fhennig May 7, 2026
78cd794
docs(example-data): update README for Python/pixi seeder
fhennig May 7, 2026
42ea432
feat(example-data): upsert collections instead of skipping existing ones
fhennig May 7, 2026
4ed9c5b
test(example-data): add test suite with mock source and mocked HTTP b…
fhennig May 7, 2026
f0dd9cc
chore: rename example-data to collection-seeding
fhennig May 7, 2026
c168c02
fix(ci): update example-data path to collection-seeding, remove unuse…
fhennig May 7, 2026
2af00d3
fix(collection-seeding): use genspectrum-bot GitHub ID (218605180) fo…
fhennig May 7, 2026
bd97def
refactor(collection-seeding): add TypedDict types (Collection, Varian…
fhennig May 7, 2026
5cf49fb
refactor(collection-seeding): authenticate via API key through websit…
fhennig May 20, 2026
0faea38
refactor(collection-seeding): rename backend.py to api.py, BackendCli…
fhennig May 20, 2026
f09b360
refactor(collection-seeding): rename test_backend.py to test_api.py, …
fhennig May 20, 2026
65f6b84
chore(collection-seeding): pin pixi dependencies, fix README default URL
fhennig May 20, 2026
74c63a6
fix(collection-seeding): fix Dockerfile COPY after backend.py→api.py …
fhennig May 20, 2026
7646933
feat(collection-seeding): add repeat loop via REPEAT_INTERVAL_HOURS e…
fhennig May 20, 2026
8b3e342
docs(collection-seeding): document REPEAT_INTERVAL_HOURS in README
fhennig May 20, 2026
578fc24
refactor(collection-seeding): convert sources to classes with Source ABC
fhennig May 20, 2026
a35bb0e
README
fhennig May 20, 2026
8932f9a
refactor(collection-seeding): move _build_collection into class, defa…
fhennig May 20, 2026
de989d6
docs(collection-seeding): add docstring to Source ABC
fhennig May 20, 2026
33233a5
refactor(collection-seeding): use None for no limit, move docstrings …
fhennig May 20, 2026
95a7100
refactor(collection-seeding): move test collections out of MockSource…
fhennig May 20, 2026
4be0043
refactor(collection-seeding): --source flag, ALL_SOURCES registry, Pa…
fhennig May 20, 2026
0cce98f
refactor(collection-seeding): move ALL_SOURCES to dedicated sources/r…
fhennig May 20, 2026
09c27b5
refactor(collection-seeding): reorder seed.py with main first
fhennig May 20, 2026
474d95b
fix(collection-seeding): don't require --api-key for --list
fhennig May 20, 2026
a665b93
refactor(collection-seeding): move repeat loop into main(), add --rep…
fhennig May 20, 2026
069af1c
docs(collection-seeding): update README with sample source, SEEDER_AP…
fhennig May 20, 2026
ce59d14
fix(collection-seeding): crash on error in repeat mode instead of swa…
fhennig May 20, 2026
c4c3f88
feat(collection-seeding): restructure pango lineage variants into 4 f…
fhennig May 27, 2026
3ded99e
fix(collection-seeding): strip organism from PUT body to avoid 400
fhennig May 27, 2026
431a9a2
refactor(collection-seeding): defer REPEAT_INTERVAL_HOURS parsing to …
fhennig May 27, 2026
031bdda
chore(collection-seeding): rename pixi workspace from example-data-se…
fhennig May 27, 2026
11e2881
refactor(collection-seeding): use dict literal instead of Collection(…
fhennig May 27, 2026
5fef10c
refactor(collection-seeding): make Source.name an abstract property t…
fhennig May 27, 2026
fefb936
chore(collection-seeding): add .dockerignore to exclude .pixi, tests,…
fhennig May 27, 2026
be538a5
chore(collection-seeding): add ruff for linting and formatting
fhennig May 27, 2026
c387cb7
chore(collection-seeding): use whitelist dockerignore with COPY . .
fhennig May 28, 2026
627f4f8
refactor(collection-seeding): raise RuntimeError instead of sys.exit …
fhennig May 28, 2026
d280a52
docs(collection-seeding): fix idempotency description in seed.py
fhennig May 28, 2026
d45ee08
fix(collection-seeding): convert REPEAT_INTERVAL_HOURS env var to float
fhennig May 28, 2026
74056c6
docs: update collection-seeding ADR with Kotlin rationale
fhennig May 28, 2026
aecff72
feat(collection-seeding): exclude sample source from default run
fhennig May 28, 2026
a8191b6
ci(collection-seeding): run Python tests before building Docker image
fhennig May 28, 2026
c19425a
fix(docker-compose): hardcode dummy system user API key for both back…
fhennig May 28, 2026
6ac8098
fix(ci): set manifest-path for setup-pixi in seeder workflow
fhennig May 28, 2026
4787342
feat(collection-seeding): add searchable tags to collection descriptions
fhennig May 28, 2026
f9671a4
feat(collection-seeding): detect and delete orphaned pango lineage co…
fhennig May 28, 2026
35aab82
Potential fix for pull request finding
fhennig May 29, 2026
f50037d
Update docs/arc42/09-architecture-decisions.md
fhennig May 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion .github/workflows/example-data-seeder.yml
Comment thread
fhennig marked this conversation as resolved.
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,24 @@ env:
DOCKER_IMAGE_NAME: ghcr.io/genspectrum/dashboards/example-data-seeder

jobs:
test:
name: Test Example Data Seeder
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6

- uses: prefix-dev/setup-pixi@v0.9.6
with:
manifest-path: collection-seeding/pixi.toml
environments: test

- name: Run tests
working-directory: ./collection-seeding
run: pixi run -e test test

dockerImage:
name: Build Example Data Seeder Docker Image
needs: test
runs-on: ubuntu-latest
permissions:
contents: read
Expand Down Expand Up @@ -37,7 +53,7 @@ jobs:
- name: Build and push image
uses: docker/build-push-action@v7
with:
context: ./example-data
context: ./collection-seeding
tags: ${{ steps.dockerMetadata.outputs.tags }}
cache-from: type=gha,scope=example-data-seeder-${{ github.ref }}
cache-to: type=gha,mode=max,scope=example-data-seeder-${{ github.ref }}
Expand Down
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,10 @@ logs
node_modules/

.env

# Python
__pycache__/
*.pyc

# pixi
.pixi/
7 changes: 7 additions & 0 deletions collection-seeding/.dockerignore
Comment thread
fhennig marked this conversation as resolved.
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
*
!pixi.toml
!pixi.lock
!seed.py
!api.py
!models.py
!sources/
13 changes: 13 additions & 0 deletions collection-seeding/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Stage 1: use pixi to resolve and install dependencies
FROM ghcr.io/prefix-dev/pixi:0.58.0 AS builder
WORKDIR /app
COPY pixi.toml pixi.lock .
RUN pixi install --frozen

# Stage 2: slim runtime image — copy only the installed site-packages
FROM python:3.13-slim AS final
WORKDIR /app
COPY --from=builder /app/.pixi/envs/default/lib/python3.13/site-packages \
/usr/local/lib/python3.13/site-packages
COPY . .
CMD ["python", "seed.py"]
59 changes: 59 additions & 0 deletions collection-seeding/README.md
Comment thread
fhennig marked this conversation as resolved.
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# collection-seeding

Seeds the backend with example collections:

- **covid-resistance-mutations** — resistance mutation data for 3CLpro, RdRp, and Spike mAb
- **covid-pango-lineages** — one collection per pango lineage, with nucleotide substitutions as variants
- **covid-pango-lineages-sample** — same as above but limited to 10 lineages, for quick testing

The script is idempotent — re-running it will create new collections or update existing ones (matched by name). If a collection's name changes in the source, the old entry is orphaned and a new one is created.

Use `--repeat-interval-hours N` (or `$REPEAT_INTERVAL_HOURS`) to run on a loop — re-seeds every N hours. Without it, the script runs once and exits.

## Via Docker Compose

The seeder runs automatically as part of Docker Compose:

```bash
BACKEND_TAG=latest WEBSITE_TAG=latest SEEDER_TAG=latest docker compose up
```

## Running locally

Requires [pixi](https://pixi.sh). Install dependencies once:

```bash
pixi install
```

Then use the provided tasks:

```bash
pixi run seed # all sources
pixi run seed-resistance # resistance mutations only
pixi run seed-lineages # pango lineages only
pixi run seed-lineages-sample # first 10 pango lineages (quick test)
```

To target a different backend:

```bash
pixi run seed --url http://localhost:4321
```

Run `pixi run seed --help` for all options, including `--source`, `--list`, `--repeat-interval-hours`, and `--url`.

## Adding a new source

1. Create `sources/your_source.py` and implement the `Source` ABC:
```python
from sources import Source
from models import Collection

class YourSource(Source):
name = "your-source-name" # used with --source flag

def get_collections(self) -> list[Collection]:
...
```
2. Register it in `sources/registry.py` by adding it to `ALL_SOURCES`.
83 changes: 83 additions & 0 deletions collection-seeding/api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
"""Shared backend API client for collection seeders."""

import time

import requests

from models import Collection, ExistingCollection

RETRY_ATTEMPTS = 30
RETRY_DELAY_S = 2


class ApiClient:
def __init__(self, base_url: str, api_key: str):
self.base_url = base_url.rstrip("/")
self._collections_url = f"{self.base_url}/api/collections"
self._auth_headers = {"Authorization": f"Bearer {api_key}"}

def wait_for_api(
self, attempts: int = RETRY_ATTEMPTS, delay: float = RETRY_DELAY_S
):
"""Poll until the API is ready by checking the collections endpoint."""
for attempt in range(1, attempts + 1):
try:
r = requests.get(self._collections_url, timeout=10)
if r.ok:
return
except requests.RequestException:
pass
print(f"Waiting for API... (attempt {attempt}/{attempts})")
time.sleep(delay)
raise RuntimeError(
f"API at {self.base_url} did not become ready after {attempts} attempts."
)

def fetch_existing_collections(self, organism: str) -> list[ExistingCollection]:
r = requests.get(
self._collections_url,
params={"organism": organism},
headers=self._auth_headers,
timeout=10,
)
if not r.ok:
raise RuntimeError(f"GET /api/collections failed: {r.status_code} {r.text}")
return r.json()

def create_collection(self, collection: Collection) -> int:
r = requests.post(
self._collections_url,
headers=self._auth_headers,
json=collection,
timeout=10,
)
if r.status_code != 201:
raise RuntimeError(
f"POST /api/collections failed: {r.status_code} {r.text}"
)
return r.json()["id"]

def delete_collection(self, collection_id: int) -> None:
r = requests.delete(
f"{self._collections_url}/{collection_id}",
headers=self._auth_headers,
timeout=10,
)
if not r.ok:
raise RuntimeError(
f"DELETE /api/collections/{collection_id} failed: {r.status_code} {r.text}"
)

def update_collection(self, collection_id: int, collection: Collection) -> None:
# CollectionUpdate has no organism field; sending it causes a 400 (fail-on-unknown-properties=true)
body = {k: v for k, v in collection.items() if k != "organism"}
r = requests.put(
f"{self._collections_url}/{collection_id}",
headers=self._auth_headers,
json=body,
timeout=10,
)
if not r.ok:
raise RuntimeError(
f"PUT /api/collections/{collection_id} failed: {r.status_code} {r.text}"
)
29 changes: 29 additions & 0 deletions collection-seeding/models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
"""Shared type definitions for collection seeding."""

from typing import TypedDict


class FilterObject(TypedDict, total=False):
aminoAcidMutations: list[str]
nucleotideMutations: list[str]


class Variant(TypedDict):
type: str
name: str
filterObject: FilterObject


class Collection(TypedDict):
name: str
organism: str
description: str
variants: list[Variant]


class ExistingCollection(TypedDict):
"""A collection as returned by the backend (includes the assigned id)."""

id: int
name: str
description: str | None
Loading