feat: clinical layer on GeoData + load NHANES/FHS through the library by marcbal77 · Pull Request #203 · bio-learn/biolearn

marcbal77 · 2026-03-29T05:41:14Z

Summary

Simple foundation for clinical data in biolearn:

GeoData.clinical layer (samples-as-rows, biomarkers-as-columns — same orientation as metadata, matches industry convention for tabular clinical data)
load_nhanes_as_geodata(year) and load_fhs_as_geodata() so both data sources flow through the library and return a GeoData
A biomarker registry with unit conversions, validated end-to-end against real FHS Period 1 data (glucose mg/dL to mmol/L)
An example (examples/01_composite_biomarkers/plot_load_nhanes_through_library.py) showing the through-the-library pattern

What changed since the first round

Based on your review:

Orientation: Clinical data is now samples-as-rows (industry standard). Each row is an entity (patient), each column is a thing about that entity (biomarker). Matches metadata.
required_features(): Dropped entirely from this PR. The schema-check / missing-data consumer path is a separate PR.
UK Biobank preset: Removed. We have not validated the library against real UK Biobank data, so claiming support would be misleading. Swapped in an fhs preset that is validated end-to-end against the real Framingham Heart Study Period 1 data we already load.
Canonical units: Fixed albumin (g/L) and creatinine (umol/L) to match what load_nhanes actually returns from the SI columns.

Test plan

make test: 198 passed, 5 skipped
make format: clean
FHS source preset validated end-to-end against the real frmgham2.csv (test_load_fhs_as_geodata_applies_fhs_unit_conversion)
load_fhs and load_fhs_as_geodata produce matching glucose values
Save/load roundtrip with the clinical layer
All 69 existing clocks still pass via the unchanged test_model.py

sarudak · 2026-04-07T17:45:25Z

    return df
+
+
+def load_nhanes_as_geodata(year):


Should add a data library entry for this.

Done. NHANES and FHS are now registered as DataLibrary entries, loaded via the same pattern as the GEO sources:

from biolearn.data_library import DataLibrary data = DataLibrary().get("NHANES_2010").load()

Added NhanesParser and FhsParser classes in data_library.py, three YAML entries (NHANES_2010, NHANES_2012, FHS), and removed the top-level load_nhanes_as_geodata and load_fhs_as_geodata convenience functions so there's a single path. The example uses DataLibrary too.

…ibrary - Add a clinical layer on GeoData (samples-as-rows, biomarkers-as-columns, same orientation as metadata) - Add load_nhanes_as_geodata and load_fhs_as_geodata so both data sources flow through the library and return a GeoData - Add a biomarker registry with unit conversions, validated end-to-end against real FHS Period 1 data (glucose mg/dL -> mmol/L) - Drop the unverified UK Biobank source preset; we have not validated it against real UK Biobank data - Add an example showing the NHANES through-the-library pattern The required_features() interface and the consumer-facing missing-data error path will be a separate PR. Addresses bio-learn#194

sarudak reviewed Apr 7, 2026

View reviewed changes

marcbal77 force-pushed the feature/clinical-infrastructure branch from a8247fc to e5b1674 Compare April 23, 2026 05:58

marcbal77 force-pushed the feature/clinical-infrastructure branch from e5b1674 to 2338b69 Compare May 26, 2026 07:56

marcbal77 changed the title ~~feat: clinical infrastructure for blood biomarker clocks~~ feat: clinical layer on GeoData + load NHANES/FHS through the library May 26, 2026

marcbal77 force-pushed the feature/clinical-infrastructure branch from 2338b69 to 24f4644 Compare May 26, 2026 08:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: clinical layer on GeoData + load NHANES/FHS through the library#203

feat: clinical layer on GeoData + load NHANES/FHS through the library#203
marcbal77 wants to merge 1 commit into
bio-learn:masterfrom
marcbal77:feature/clinical-infrastructure

marcbal77 commented Mar 29, 2026 •

edited

Loading

Uh oh!

sarudak Apr 7, 2026

Uh oh!

marcbal77 May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

marcbal77 commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed since the first round

Test plan

Uh oh!

sarudak Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

marcbal77 May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

marcbal77 commented Mar 29, 2026 •

edited

Loading