feat: add NGWMN getters as an ogc sibling; extract a shared OGC engine#324
Draft
thodson-usgs wants to merge 2 commits into
Draft
feat: add NGWMN getters as an ogc sibling; extract a shared OGC engine#324thodson-usgs wants to merge 2 commits into
ogc sibling; extract a shared OGC engine#324thodson-usgs wants to merge 2 commits into
Conversation
35500b3 to
9e4c0ca
Compare
ogc sibling; extract a shared OGC engine
1c2ab7a to
3ba001e
Compare
Port the NGWMN functions from the R `dataRetrieval` package (DOI-USGS/dataRetrieval#904) and refactor the Water Data OGC machinery into a generic, API-agnostic engine, so NGWMN and Water Data are sibling layers on top of it -- NGWMN does not depend on Water Data. dataretrieval/ogc/ generic OGC engine (no service-specific config) engine.py request build, pagination, parse/finalize, get_ogc_data chunking.py URL-byte multi-value chunker filters.py CQL `filter` splitting progress.py self-updating status line The engine is parameterized by an `OgcDialect` and a base-url context variable rather than branching on service names: Water Data POSTs CQL2 for `monitoring-locations` and renders `daily` time args date-only; NGWMN needs neither. Adding a sibling API is a new dialect + base URL, not engine edits. dataretrieval/ngwmn.py sibling getters that import only dataretrieval.ogc: get_sites, get_water_level, get_lithology, get_well_construction, get_providers dataretrieval/waterdata/ thin Water Data layer on the engine; the Statistics API lives in its own waterdata/stats.py module. Unified `state` parameter across the modern getters, accepting a full name, a two-letter postal code, or a two-digit ANSI/FIPS code; normalized by codes.states.to_state (50 states + DC, fails fast on a typo) and resolved at the getter layer. The native state_code/state_name parameters remain as an escape hatch. Also: export ChunkInterrupted at the package top level; key the empty-result short-circuit off `features` (NGWMN omits `numberReturned`) and tolerate geometry-less features; always return PEP-8 snake_case columns; and add a pre-commit mypy hook mirroring the CI type-check. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
3ba001e to
803a15d
Compare
Collaborator
Author
|
@ehinman, |
Every getter that routes through the chunked fan-out can raise a resumable `ChunkInterrupted` when a transient failure outlasts the built-in retries, but nothing said so at the point of use -- a `help(get_daily)` reader would only discover the catch-and-resume affordance via the separate userguide. Add a one-line `Raises: ChunkInterrupted` entry (pointing to :doc:`/userguide/errors` for the full resume example -- the single source of truth) to the 12 chunker-backed getters. `get_cql` (own `_run_sync` path), the stats getters, and the Samples getters don't go through the chunker, so they're intentionally left out. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
thodson-usgs
added a commit
to thodson-usgs/dataretrieval-python
that referenced
this pull request
Jun 15, 2026
`ogc/engine.py` had absorbed ~1940 LOC spanning six unrelated concerns -- constants/config, HTTP error handling, arg validation, request building, response shaping, and the async pagination driver. (DOI-USGS#324 extracted this generic engine out of waterdata/utils.py, so the god-module relocated here rather than disappearing -- this is DOI-USGS#318's original intent, retargeted onto the post-DOI-USGS#324 layout.) Split it into cohesive private siblings under `ogc/`, moving every definition AST-identically (no signature/logic change): _constants.py URLs, OgcDialect, regexes, param sets, context vars, gpd probe _http.py headers, error-body, _raise_for_non_200, retry-after _validate.py arg normalization/validation, id switching _requests.py request building (GET/CQL2, date formatting, pagination URLs) _responses.py wire response -> DataFrame (parse + shape + finalize) engine.py async pagination driver + get_ogc_data, re-exporting the above engine.py (1937 -> 570 LOC) stays the public facade: it re-exports every name so existing `from dataretrieval.ogc.engine import ...` sites in waterdata, ngwmn, and the tests keep working unchanged. The geopandas parse chain lives in _responses.py (its boundary is the domain, not a test patch target); the single gpd monkeypatch seam was relocated to `_responses` -- the only test change. Behavior-preserving: all 42 top-level symbols moved with byte-identical AST bodies; 296 offline tests pass; ruff + mypy --strict clean; no import cycles. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ports the NGWMN functions from the R
dataRetrievalPR (DOI-USGS/dataRetrieval#904) and, per review, refactors the Water Data OGC machinery into a shared engine so NGWMN and Water Data are sibling layers on top of it — NGWMN does not depend on Water Data.Architecture
Layering / import rules (the organization a reviewer can check):
ogc/*is fully API-agnostic — it imports nodataretrieval.waterdataordataretrieval.ngwmn.waterdata/*andngwmn.pyare siblings on the engine;ngwmn.pyimports onlydataretrieval.ogc(+codes.states), neverdataretrieval.waterdata.api.py/stats.py/ratings.py/nearest.pyare thin getter layers; the shared Water-Data state (service→id map, dialect, theget_ogc_datawrapper) lives once inutils.py.The engine is parameterized, not branched:
get_ogc_data(args, service, output_id, *, base_url, extra_id_cols, dialect). AnOgcDialect(cql2_services, date_only_services, …)(threaded via a context variable, like the base-url context) carries per-API quirks — Water Data POSTs CQL2 formonitoring-locationsand rendersdailytime args date-only; NGWMN needs neither. Adding a sibling API is a new dialect + base URL, not engine edits.The multi-value chunker (recently fixed in #322) is generic and applies to NGWMN unchanged — verified that a forced-small-budget multi-site NGWMN query chunks and unions correctly.
Unified
stateparameterA single, canonical
stateparameter spans the modern getters and accepts a full name ("Wisconsin"), a two-letter postal code ("WI"), or a two-digit ANSI/FIPS code ("55"). It is normalized once bycodes.states.to_state()(50 states + DC; fails fast on a typo) and resolved at the getter layer into whatever native queryable each endpoint wants —state_namefor the OGC getters,US:XXstate_codefor stats, the per-collection queryable for NGWMN. The nativestate_code/state_nameparameters remain as an escape hatch (e.g. non-US FIPS codes); passingstatetogether with either raises.Engine fixes (NGWMN's API differs from the main one)
featuresrather than thenumberReturnedthat NGWMN omits (otherwise pages with data were silently dropped).geometrykey (GeoDataFrame.from_featurescan't index a missing key).PEP naming
The engine snake_cases any non-snake column in
finalize, so the package always returns PEP-8 column names regardless of the upstream API — a no-op today (both APIs are already snake_case) but enforced going forward.Tests & checks
Live NGWMN tests for all five getters (
tests/ngwmn_test.py); unit tests forto_state, the_with_state/ getter-layer state resolution, and_to_snake_case;mock.patchsites repointed toogc.engine; a module fixture activatesWATERDATA_DIALECTfor the direct_construct_api_requestsunit tests. The unit suite passes andmypy --strict+ruffare clean. A pre-commitmypyhook (pinned to the samemypy<2major as CI, withhttpx/anyioin the hook env) now mirrors the CI type-check locally.Note
CI will show 3 pre-existing failures (
test_get_daily_properties/_id,test_get_continuous) — the live-API drift fixed by #323, not introduced here (branch is offmain). They go green once #323 merges.🤖 Generated with Claude Code