feat: clinical layer on GeoData + load NHANES/FHS through the library#203
Open
marcbal77 wants to merge 1 commit into
Open
feat: clinical layer on GeoData + load NHANES/FHS through the library#203marcbal77 wants to merge 1 commit into
marcbal77 wants to merge 1 commit into
Conversation
sarudak
reviewed
Apr 7, 2026
| return df | ||
|
|
||
|
|
||
| def load_nhanes_as_geodata(year): |
Member
There was a problem hiding this comment.
Should add a data library entry for this.
Member
Author
There was a problem hiding this comment.
Done. NHANES and FHS are now registered as DataLibrary entries, loaded via the same pattern as the GEO sources:
from biolearn.data_library import DataLibrary
data = DataLibrary().get("NHANES_2010").load()Added NhanesParser and FhsParser classes in data_library.py, three YAML entries (NHANES_2010, NHANES_2012, FHS), and removed the top-level load_nhanes_as_geodata and load_fhs_as_geodata convenience functions so there's a single path. The example uses DataLibrary too.
a8247fc to
e5b1674
Compare
e5b1674 to
2338b69
Compare
…ibrary - Add a clinical layer on GeoData (samples-as-rows, biomarkers-as-columns, same orientation as metadata) - Add load_nhanes_as_geodata and load_fhs_as_geodata so both data sources flow through the library and return a GeoData - Add a biomarker registry with unit conversions, validated end-to-end against real FHS Period 1 data (glucose mg/dL -> mmol/L) - Drop the unverified UK Biobank source preset; we have not validated it against real UK Biobank data - Add an example showing the NHANES through-the-library pattern The required_features() interface and the consumer-facing missing-data error path will be a separate PR. Addresses bio-learn#194
2338b69 to
24f4644
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Simple foundation for clinical data in biolearn:
GeoData.clinicallayer (samples-as-rows, biomarkers-as-columns — same orientation asmetadata, matches industry convention for tabular clinical data)load_nhanes_as_geodata(year)andload_fhs_as_geodata()so both data sources flow through the library and return aGeoDataexamples/01_composite_biomarkers/plot_load_nhanes_through_library.py) showing the through-the-library patternWhat changed since the first round
Based on your review:
metadata.required_features(): Dropped entirely from this PR. The schema-check / missing-data consumer path is a separate PR.fhspreset that is validated end-to-end against the real Framingham Heart Study Period 1 data we already load.albumin(g/L) andcreatinine(umol/L) to match whatload_nhanesactually returns from the SI columns.Test plan
make test: 198 passed, 5 skippedmake format: cleantest_load_fhs_as_geodata_applies_fhs_unit_conversion)load_fhsandload_fhs_as_geodataproduce matching glucose valuestest_model.py