-
Notifications
You must be signed in to change notification settings - Fork 0
feat(website): python seeder with pango lineages and test suite #1203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
54 commits
Select commit
Hold shift + click to select a range
45cd3ef
feat(example-data): add Python pango lineage collection seeder
fhennig ee8ff3a
feat(example-data): add requirements.txt and Dockerfile for lineages …
fhennig b7f327e
refactor(example-data): unified Python seeder with source modules
fhennig 2ffef39
feat(example-data): replace requirements.txt with pixi
fhennig 4121ad8
feat(example-data): upsert seed user via POST /users/sync before seeding
fhennig 965bad3
chore: add Python and pixi entries to .gitignore
fhennig 78cd794
docs(example-data): update README for Python/pixi seeder
fhennig 42ea432
feat(example-data): upsert collections instead of skipping existing ones
fhennig 4ed9c5b
test(example-data): add test suite with mock source and mocked HTTP b…
fhennig f0dd9cc
chore: rename example-data to collection-seeding
fhennig c168c02
fix(ci): update example-data path to collection-seeding, remove unuse…
fhennig 2af00d3
fix(collection-seeding): use genspectrum-bot GitHub ID (218605180) fo…
fhennig bd97def
refactor(collection-seeding): add TypedDict types (Collection, Varian…
fhennig 5cf49fb
refactor(collection-seeding): authenticate via API key through websit…
fhennig 0faea38
refactor(collection-seeding): rename backend.py to api.py, BackendCli…
fhennig f09b360
refactor(collection-seeding): rename test_backend.py to test_api.py, …
fhennig 65f6b84
chore(collection-seeding): pin pixi dependencies, fix README default URL
fhennig 74c63a6
fix(collection-seeding): fix Dockerfile COPY after backend.py→api.py …
fhennig 7646933
feat(collection-seeding): add repeat loop via REPEAT_INTERVAL_HOURS e…
fhennig 8b3e342
docs(collection-seeding): document REPEAT_INTERVAL_HOURS in README
fhennig 578fc24
refactor(collection-seeding): convert sources to classes with Source ABC
fhennig a35bb0e
README
fhennig 8932f9a
refactor(collection-seeding): move _build_collection into class, defa…
fhennig de989d6
docs(collection-seeding): add docstring to Source ABC
fhennig 33233a5
refactor(collection-seeding): use None for no limit, move docstrings …
fhennig 95a7100
refactor(collection-seeding): move test collections out of MockSource…
fhennig 4be0043
refactor(collection-seeding): --source flag, ALL_SOURCES registry, Pa…
fhennig 0cce98f
refactor(collection-seeding): move ALL_SOURCES to dedicated sources/r…
fhennig 09c27b5
refactor(collection-seeding): reorder seed.py with main first
fhennig 474d95b
fix(collection-seeding): don't require --api-key for --list
fhennig a665b93
refactor(collection-seeding): move repeat loop into main(), add --rep…
fhennig 069af1c
docs(collection-seeding): update README with sample source, SEEDER_AP…
fhennig ce59d14
fix(collection-seeding): crash on error in repeat mode instead of swa…
fhennig c4c3f88
feat(collection-seeding): restructure pango lineage variants into 4 f…
fhennig 3ded99e
fix(collection-seeding): strip organism from PUT body to avoid 400
fhennig 431a9a2
refactor(collection-seeding): defer REPEAT_INTERVAL_HOURS parsing to …
fhennig 031bdda
chore(collection-seeding): rename pixi workspace from example-data-se…
fhennig 11e2881
refactor(collection-seeding): use dict literal instead of Collection(…
fhennig 5fef10c
refactor(collection-seeding): make Source.name an abstract property t…
fhennig fefb936
chore(collection-seeding): add .dockerignore to exclude .pixi, tests,…
fhennig be538a5
chore(collection-seeding): add ruff for linting and formatting
fhennig c387cb7
chore(collection-seeding): use whitelist dockerignore with COPY . .
fhennig 627f4f8
refactor(collection-seeding): raise RuntimeError instead of sys.exit …
fhennig d280a52
docs(collection-seeding): fix idempotency description in seed.py
fhennig d45ee08
fix(collection-seeding): convert REPEAT_INTERVAL_HOURS env var to float
fhennig 74056c6
docs: update collection-seeding ADR with Kotlin rationale
fhennig aecff72
feat(collection-seeding): exclude sample source from default run
fhennig a8191b6
ci(collection-seeding): run Python tests before building Docker image
fhennig c19425a
fix(docker-compose): hardcode dummy system user API key for both back…
fhennig 6ac8098
fix(ci): set manifest-path for setup-pixi in seeder workflow
fhennig 4787342
feat(collection-seeding): add searchable tags to collection descriptions
fhennig f9671a4
feat(collection-seeding): detect and delete orphaned pango lineage co…
fhennig 35aab82
Potential fix for pull request finding
fhennig f50037d
Update docs/arc42/09-architecture-decisions.md
fhennig File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -13,3 +13,10 @@ logs | |
| node_modules/ | ||
|
|
||
| .env | ||
|
|
||
| # Python | ||
| __pycache__/ | ||
| *.pyc | ||
|
|
||
| # pixi | ||
| .pixi/ | ||
|
fhennig marked this conversation as resolved.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| * | ||
| !pixi.toml | ||
| !pixi.lock | ||
| !seed.py | ||
| !api.py | ||
| !models.py | ||
| !sources/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| # Stage 1: use pixi to resolve and install dependencies | ||
| FROM ghcr.io/prefix-dev/pixi:0.58.0 AS builder | ||
| WORKDIR /app | ||
| COPY pixi.toml pixi.lock . | ||
| RUN pixi install --frozen | ||
|
|
||
| # Stage 2: slim runtime image — copy only the installed site-packages | ||
| FROM python:3.13-slim AS final | ||
| WORKDIR /app | ||
| COPY --from=builder /app/.pixi/envs/default/lib/python3.13/site-packages \ | ||
| /usr/local/lib/python3.13/site-packages | ||
| COPY . . | ||
| CMD ["python", "seed.py"] |
|
fhennig marked this conversation as resolved.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| # collection-seeding | ||
|
|
||
| Seeds the backend with example collections: | ||
|
|
||
| - **covid-resistance-mutations** — resistance mutation data for 3CLpro, RdRp, and Spike mAb | ||
| - **covid-pango-lineages** — one collection per pango lineage, with nucleotide substitutions as variants | ||
| - **covid-pango-lineages-sample** — same as above but limited to 10 lineages, for quick testing | ||
|
|
||
| The script is idempotent — re-running it will create new collections or update existing ones (matched by name). If a collection's name changes in the source, the old entry is orphaned and a new one is created. | ||
|
|
||
| Use `--repeat-interval-hours N` (or `$REPEAT_INTERVAL_HOURS`) to run on a loop — re-seeds every N hours. Without it, the script runs once and exits. | ||
|
|
||
| ## Via Docker Compose | ||
|
|
||
| The seeder runs automatically as part of Docker Compose: | ||
|
|
||
| ```bash | ||
| BACKEND_TAG=latest WEBSITE_TAG=latest SEEDER_TAG=latest docker compose up | ||
| ``` | ||
|
|
||
| ## Running locally | ||
|
|
||
| Requires [pixi](https://pixi.sh). Install dependencies once: | ||
|
|
||
| ```bash | ||
| pixi install | ||
| ``` | ||
|
|
||
| Then use the provided tasks: | ||
|
|
||
| ```bash | ||
| pixi run seed # all sources | ||
| pixi run seed-resistance # resistance mutations only | ||
| pixi run seed-lineages # pango lineages only | ||
| pixi run seed-lineages-sample # first 10 pango lineages (quick test) | ||
| ``` | ||
|
|
||
| To target a different backend: | ||
|
|
||
| ```bash | ||
| pixi run seed --url http://localhost:4321 | ||
| ``` | ||
|
|
||
| Run `pixi run seed --help` for all options, including `--source`, `--list`, `--repeat-interval-hours`, and `--url`. | ||
|
|
||
| ## Adding a new source | ||
|
|
||
| 1. Create `sources/your_source.py` and implement the `Source` ABC: | ||
| ```python | ||
| from sources import Source | ||
| from models import Collection | ||
|
|
||
| class YourSource(Source): | ||
| name = "your-source-name" # used with --source flag | ||
|
|
||
| def get_collections(self) -> list[Collection]: | ||
| ... | ||
| ``` | ||
| 2. Register it in `sources/registry.py` by adding it to `ALL_SOURCES`. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,83 @@ | ||
| """Shared backend API client for collection seeders.""" | ||
|
|
||
| import time | ||
|
|
||
| import requests | ||
|
|
||
| from models import Collection, ExistingCollection | ||
|
|
||
| RETRY_ATTEMPTS = 30 | ||
| RETRY_DELAY_S = 2 | ||
|
|
||
|
|
||
| class ApiClient: | ||
| def __init__(self, base_url: str, api_key: str): | ||
| self.base_url = base_url.rstrip("/") | ||
| self._collections_url = f"{self.base_url}/api/collections" | ||
| self._auth_headers = {"Authorization": f"Bearer {api_key}"} | ||
|
|
||
| def wait_for_api( | ||
| self, attempts: int = RETRY_ATTEMPTS, delay: float = RETRY_DELAY_S | ||
| ): | ||
| """Poll until the API is ready by checking the collections endpoint.""" | ||
| for attempt in range(1, attempts + 1): | ||
| try: | ||
| r = requests.get(self._collections_url, timeout=10) | ||
| if r.ok: | ||
| return | ||
| except requests.RequestException: | ||
| pass | ||
| print(f"Waiting for API... (attempt {attempt}/{attempts})") | ||
| time.sleep(delay) | ||
| raise RuntimeError( | ||
| f"API at {self.base_url} did not become ready after {attempts} attempts." | ||
| ) | ||
|
|
||
| def fetch_existing_collections(self, organism: str) -> list[ExistingCollection]: | ||
| r = requests.get( | ||
| self._collections_url, | ||
| params={"organism": organism}, | ||
| headers=self._auth_headers, | ||
| timeout=10, | ||
| ) | ||
| if not r.ok: | ||
| raise RuntimeError(f"GET /api/collections failed: {r.status_code} {r.text}") | ||
| return r.json() | ||
|
|
||
| def create_collection(self, collection: Collection) -> int: | ||
| r = requests.post( | ||
| self._collections_url, | ||
| headers=self._auth_headers, | ||
| json=collection, | ||
| timeout=10, | ||
| ) | ||
| if r.status_code != 201: | ||
| raise RuntimeError( | ||
| f"POST /api/collections failed: {r.status_code} {r.text}" | ||
| ) | ||
| return r.json()["id"] | ||
|
|
||
| def delete_collection(self, collection_id: int) -> None: | ||
| r = requests.delete( | ||
| f"{self._collections_url}/{collection_id}", | ||
| headers=self._auth_headers, | ||
| timeout=10, | ||
| ) | ||
| if not r.ok: | ||
| raise RuntimeError( | ||
| f"DELETE /api/collections/{collection_id} failed: {r.status_code} {r.text}" | ||
| ) | ||
|
|
||
| def update_collection(self, collection_id: int, collection: Collection) -> None: | ||
| # CollectionUpdate has no organism field; sending it causes a 400 (fail-on-unknown-properties=true) | ||
| body = {k: v for k, v in collection.items() if k != "organism"} | ||
| r = requests.put( | ||
| f"{self._collections_url}/{collection_id}", | ||
| headers=self._auth_headers, | ||
| json=body, | ||
| timeout=10, | ||
| ) | ||
| if not r.ok: | ||
| raise RuntimeError( | ||
| f"PUT /api/collections/{collection_id} failed: {r.status_code} {r.text}" | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| """Shared type definitions for collection seeding.""" | ||
|
|
||
| from typing import TypedDict | ||
|
|
||
|
|
||
| class FilterObject(TypedDict, total=False): | ||
| aminoAcidMutations: list[str] | ||
| nucleotideMutations: list[str] | ||
|
|
||
|
|
||
| class Variant(TypedDict): | ||
| type: str | ||
| name: str | ||
| filterObject: FilterObject | ||
|
|
||
|
|
||
| class Collection(TypedDict): | ||
| name: str | ||
| organism: str | ||
| description: str | ||
| variants: list[Variant] | ||
|
|
||
|
|
||
| class ExistingCollection(TypedDict): | ||
| """A collection as returned by the backend (includes the assigned id).""" | ||
|
|
||
| id: int | ||
| name: str | ||
| description: str | None |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.