Skip to content

Geospatial SQL benchmark suite: prove core geo/climate ops are relational#177

Open
alxmrs wants to merge 26 commits into
mainfrom
geospatial-sql-benchmarks
Open

Geospatial SQL benchmark suite: prove core geo/climate ops are relational#177
alxmrs wants to merge 26 commits into
mainfrom
geospatial-sql-benchmarks

Conversation

@alxmrs

@alxmrs alxmrs commented Jun 23, 2026

Copy link
Copy Markdown
Owner

What & why

A suite of worked examples in benchmarks/geospatial/ arguing a deliberately radical thesis: the core operations of geospatial and climate analysis — the ones we reach for an array library to perform — are, underneath, relational operations. Climatologies, anomalies, zonal means, spectral indices, forecast skill, even reprojection and regridding map onto ordinary SQL (GROUP BY, JOIN, window functions, CASE, and the occasional scalar UDF).

Every case poses the operation in SQL against xarray-sql, round-trips the result back to an xarray.Dataset/DataArray (via to_dataset), and asserts it matches an xarray reference (coordinate-aligned xr.testing.assert_allclose). SQL in, an array out, verified — and each run prints the result's standard xarray repr. The headline is correctness + clarity of the SQL, not raw speed.

The README gains a "Does it work?" section summarizing the findings (addresses #46), and a full write-up lives in docs/geospatial.md (wired into the docs nav).

The cases (all validated end-to-end on real data)

# Operation Relational form Data + reference
01 NDVI column arithmetic (nodata auto-masked on decode) real Sentinel-2 L2A Zarr (ESA EOPF, via pystac-client + xr.open_datatree)
02 Diurnal climatology GROUP BY lat, lon, hour ARCO-ERA5
03 Zonal mean GROUP BY latitude full ARCO-ERA5, WHERE-pruned
04 Anomaly climatology CTE self-JOIN ARCO-ERA5
05 Forecast skill forecast↔truth JOIN on valid_time = init + lead WeatherBench 2 Pangu & GraphCast vs ERA5
06 Zonal stats over regions raster × vector range JOIN full ARCO-ERA5 + continental boxes
07 Reprojection PROJ scalar UDF (ST_Transform-style) Earth Engine via Xee, validated against EE's own pixelLonLat
08 Regridding sparse weight-table JOIN + weighted GROUP BY real SRTM elevation via Xee, vs xarray .interp()

Highlights

  • It reads naturally to the people who would use it. The array side is idiomatic Xarray; the query side is idiomatic, lazy SQL. You can point a query at the entire ERA5 archive — decades of global hourly data — and it reads only the slice the query asks for.
  • Real models, checked against the literature. Case 05 scores the Pangu-Weather and GraphCast forecast models against ERA5 and reproduces the published WeatherBench 2 result that GraphCast leads at every forecast horizon — written as a single JOIN that lines each forecast up with its valid time.
  • Validated against independent implementations. Case 07 checks its SQL reprojection against Earth Engine's own coordinate geometry (through Xee) — a genuinely different implementation, not PROJ compared with itself. Case 08 regrids real SRTM terrain and matches Xarray's own interpolation.
  • Honest about the limits. Reprojection and regridding are the operations most tied to arrays; the write-up shows where SQL expresses them cleanly and where arrays still earn their keep — computing the interpolation weights (the geometry), which SQL then applies as a join.

Running

Run a single case with uv run benchmarks/geospatial/<case>.py, or the whole suite with benchmarks/geospatial/run_all.sh — from any directory. Cases that need data or credentials you do not have skip cleanly with a clear message: the cloud datasets (ERA5, WeatherBench 2) are read anonymously, and the Earth Engine cases reuse your existing gcloud login. (Until the next xarray-sql release, the scripts run against the version in this branch.)

Performance

Profiled timings — 8 cases on an e2-standard-8 VM in us-central1

Measured cold vs cold with benchmarks/geospatial/run_perf.sh: each case runs
once per fresh process, with no warmup, repeated 5 times. Fairness took some care
— the trap is caching. A reference that calls .load() caches its data in place
on the very object the SQL table also reads from, so a later read (even just running
the reference after the SQL query in the same process) can be served warm. Case 05
loaded shared objects and now uses .compute() instead (fresh array, inputs stay
lazy); the other references either reopen their data or recompute eagerly with
chunks=None (NumPy, no graph to keep warm). Verified directly: reading a window
repeatedly in one process stays flat, and neither side warms the other. Reported are
the median/spread of wall-clock and peak Python-allocator memory across the cold
runs. Hardware: Google Compute Engine e2-standard-8 (8 vCPU, 32 GB), us-central1
— in-region with the ARCO-ERA5 and WeatherBench 2 buckets.

Case Step reps median (s) stdev (s) min (s) max (s) peak (MB)
01 · NDVI (per-pixel arithmetic) SQL 5 3.093 0.050 2.998 3.128 98.5
xarray reference 5 0.402 0.060 0.372 0.507 42.0
02 · Climatology (GROUP BY lat, lon, hour) SQL 5 5.987 1.328 5.901 8.942 513.8
xarray reference 5 2.449 0.401 2.332 3.312 43.7
03 · Zonal mean (GROUP BY latitude) SQL 5 3.224 0.048 3.175 3.288 245.3
xarray reference 5 0.552 0.020 0.529 0.576 249.5
04 · Anomaly (climatology self-JOIN) SQL 5 9.479 0.413 9.382 10.280 524.4
xarray reference 5 3.013 0.584 2.875 4.199 80.5
05 · Forecast skill (forecast↔truth JOIN) SQL 5 23.241 0.101 23.114 23.348 34.2
xarray reference 5 0.336 0.014 0.314 0.352 2.2
06 · Zonal stats (raster × vector JOIN) SQL 5 6.285 0.052 6.191 6.317 509.9
xarray reference 5 2.235 0.216 2.195 2.703 359.9
07 · Reprojection (PROJ scalar UDF) SQL 1 0.039 0.000 0.039 0.039 0.3
08 · Regridding (weight-table JOIN) SQL 5 0.061 0.001 0.060 0.063 0.8
xarray reference 5 0.018 0.001 0.018 0.020 0.2
  • SQL is ctx.sql(...).to_dataset(...) — the cold lazy read from cloud storage plus the relational compute; xarray reference is the array-paradigm computation each case is validated against. SQL is slower because the paradigm has real cost: the grid is exploded to rows, joined with a hash join, and shuttled through Arrow — work the array path does in-place over contiguous buffers.
  • Case 05's SQL time is CPU-bound (the join + GIL-held row production) and the most hardware-sensitive number here — a second e2-standard-8 run measured ≈12 s rather than ≈23 s. The reference is read-bound and stable, so read the 05 ratio as "the relational form costs real CPU," not a fixed multiplier.
  • Case 01 reads Sentinel-2 from EODC (Europe), the only non-US source, so its time includes a cross-region read.
  • Cases 07–08 load their Earth Engine inputs into memory once and then compute, so they are methodology-agnostic (cold ≡ warm) and are reported from a single in-memory run. Case 07's PROJ UDF is not re-entrant across repeated in-process executions, so it is measured once (reps=1).

Notes for review

  • ruff and mypy pass via pre-commit. The suite lives outside the package (no __init__.py), so it is not collected by pytest or shipped in the wheel.
  • Each script declares its own dependencies inline (PEP 723); nothing is added to pyproject.toml.
  • Adds a "Does it work?" section to the README (addresses Add a "Does it work?" section to the README #46) and a longer write-up at docs/geospatial.md.

Follow-ups

  • Promote the reprojection UDF (case 07) into the package, alongside the existing cftime UDF.
  • Once a release includes the SQL→Xarray round-trip, the cases run straight from PyPI.

🤖 Generated with Claude Code

A suite of worked examples arguing that core geospatial/climate operations
we associate with the array paradigm are, underneath, relational operations.
Each case poses the operation in SQL against xarray-sql and asserts the result
matches an xarray/array reference to floating-point tolerance.

Cases (all validated):
- 01 NDVI: per-pixel column algebra + CASE mask, on a real Sentinel-2 L2A
  scene in Zarr (ESA EOPF sample service).
- 02 climatology: diurnal cycle as GROUP BY lat, lon, hour (ARCO-ERA5).
- 03 zonal mean: GROUP BY latitude over the full ARCO-ERA5 archive,
  WHERE-pruned to one day.
- 04 anomaly: climatology CTE self-JOIN (ARCO-ERA5).
- 05 forecast skill: Pangu-Weather and GraphCast vs ERA5 (WeatherBench2),
  a forecast<->truth JOIN on valid_time = init + lead; reproduces GraphCast
  beating Pangu at every lead.
- 06 zonal stats: raster x vector range JOIN of full ERA5 against continental
  bounding boxes.
- 07 reprojection: PROJ-backed scalar UDF (ST_Transform-style), returning a
  {lon, lat} struct to keep PROJ single-threaded.
- 08 regridding: bilinear regrid as a sparse weight-table JOIN, validated
  against xarray .interp().

Includes a shared harness (_harness.py: timing, peak memory, assert_allclose,
SQL echo), ERA5 helpers (_era5.py), a suite README, and a narrative write-up
(docs/geospatial.md) wired into the docs nav. Scripts carry PEP 723 inline
metadata so they are runnable standalone via `uv run`.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9

@alxmrs alxmrs left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of first example

Comment thread benchmarks/geospatial/01_ndvi.py
Comment thread benchmarks/geospatial/01_ndvi.py Outdated
Comment on lines +73 to +78
b04 = zarr.open_array(_R10M + "/b04", mode="r")
b08 = zarr.open_array(_R10M + "/b08", mode="r")
rows = slice(_ROW0, _ROW0 + _N)
cols = slice(_COL0, _COL0 + _N)
red = np.asarray(b04[rows, cols])
nir = np.asarray(b08[rows, cols])

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really the canonical way to open the dataset in Xarray? Please review the docs and sentinel example to see if we can make this more typical. Is this fixed if we use pystack?

Comment thread benchmarks/geospatial/01_ndvi.py Outdated
Comment on lines +82 to +88
# Pixel-center coordinates from the affine transform.
x = _X0 + (np.arange(_COL0, _COL0 + _N) + 0.5) * _RES
y = _Y0 - (np.arange(_ROW0, _ROW0 + _N) + 0.5) * _RES
return xr.Dataset(
{"red": (["y", "x"], red), "nir": (["y", "x"], nir)},
coords={"y": y.astype("float64"), "x": x.astype("float64")},
).chunk({"y": 256, "x": 256})

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to calculate NDVI in pure xarray after opening the dataset.

Comment thread benchmarks/geospatial/01_ndvi.py Outdated
Comment on lines +126 to +127
got = got.sortby(["y", "x"])
ref = ref.sortby(["y", "x"])

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I belive we can omit this if we use ORDER BY in the SQL statement.

…nual scaling/CASE/sort

Addresses PR review of case 01:

- Discover the Sentinel-2 L2A product with pystac-client instead of a
  hard-coded object-store href.
- Open it the canonical xarray way: xr.open_datatree on the EOPF Zarr, then
  the measurements/reflectance/r10m node. This yields B04/B08 already scaled
  to reflectance with x/y coordinates, so the manual zarr.open_array reads and
  proj:transform coordinate reconstruction are gone.
- NDVI is now pure column arithmetic on both sides: xarray decodes the band
  _FillValue to NaN on open, so nodata masking is free and the CASE WHEN is
  dropped. The reference is a clean one-line xarray expression.
- Use ORDER BY y, x in SQL and compare via coordinate-aligned
  xr.testing.assert_allclose (reindex_like) instead of manual .sortby() on
  both sides.

Docs (suite README, docs/geospatial.md) updated to match.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9
@alxmrs

alxmrs commented Jun 23, 2026

Copy link
Copy Markdown
Owner Author

Thanks for the review of case 01 — all four points addressed in 6bc41aa:

  1. Use pystac for discovery — now uses pystac-client to search the EOPF STAC catalog and resolve the product, instead of a hard-coded object-store href. Added to the script's PEP 723 deps.
  2. Idiomatic xarray open — replaced zarr.open_array + manual proj:transform coordinate reconstruction with xr.open_datatree(item.assets["product"].href, engine="zarr") and the measurements/reflectance/r10m node. xarray decodes the bands to reflectance and brings x/y coordinates for free.
  3. NDVI in pure xarray — the reference is now a one-liner: (scene.nir - scene.red) / (scene.nir + scene.red). Because xarray decodes the band _FillValue to NaN on open, nodata masking is automatic and the CASE WHEN is gone on both sides.
  4. ORDER BY instead of sortby — the SQL uses ORDER BY y, x, and the comparison aligns by coordinate label via xr.testing.assert_allclose(got, ref.reindex_like(got)), so the manual .sortby() on both sides is removed.

Verified end-to-end: 1,048,576 pixels match the xarray reference; mean NDVI 0.235 over the window. Docs (suite README + docs/geospatial.md) updated to match.

One note: xarray-eopf's eopf-zarr engine would let you open the group directly, but it pins zarr<3, which conflicts with this project's zarr 3 — so I kept the plain engine="zarr" + open_datatree path, which needs no extra engine.

Comment thread benchmarks/geospatial/02_climatology.py
Comment thread benchmarks/geospatial/02_climatology.py Outdated
Comment on lines +50 to +52
ds = load_window(
full, "2m_temperature", time=_TIME, latitude=_LAT, longitude=_LON
)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think repeating the sel command here is instructive.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend a fluent interface

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, the open_zarr call and the slicing could be one fluent chain.

Comment thread benchmarks/geospatial/02_climatology.py Outdated
show_sql(sql)

with timed("SQL diurnal climatology"):
got = ctx.sql(sql).to_pandas()

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be loaded to a dataset. That's how climatologies are typically used.

Comment thread benchmarks/geospatial/02_climatology.py Outdated
# Array reference: the textbook groupby-over-the-cycle reduction.
with timed("xarray reference"):
clim = ds["2m_temperature"].groupby("time.hour").mean("time") - 273.15
ref = clim.to_dataframe(name="clim_c").reset_index()

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omit transforming this to a dataframe.

Comment thread benchmarks/geospatial/03_zonal_mean.py Outdated

ctx = xql.XarrayContext()
with timed("register full ERA5"):
register_era5(ctx, ds)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this helper either since repetition here serves as useful documentation.

Comment thread benchmarks/geospatial/03_zonal_mean.py Outdated
show_sql(sql)

with timed("SQL zonal mean (WHERE-pruned to one day)"):
got = ctx.sql(sql).to_pandas()

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe to_pandas() invokes a copy. Is it needed?

… coordinate-aligned comparison

Addresses round-2 PR review (cases 02, 03) and applies the same patterns
across the suite:

- Remove the _era5.py helper module; each case now shows the
  xr.open_zarr(...).sel(...).load() fluent chain inline (and registration
  inline), since the repetition documents the canonical xarray usage.
- Round-trip gridded SQL results back to xarray Datasets via to_dataset(...)
  instead of to_pandas() — a climatology/zonal-mean/anomaly/regrid is gridded
  data and is used as such.
- Keep references in pure xarray (no to_dataframe); compare with a new
  assert_grid_close helper that aligns by coordinate label
  (xr.testing.assert_allclose), so no manual .sortby() is needed.
- Add ORDER BY to the climatology and anomaly queries.
- Apply the same Dataset-return + xarray-comparison treatment to the forecast
  skill (RMSE-by-lead as a DataArray over lead), reprojection (lon/lat grid),
  and regrid (reshaped to the target grid) cases.
- Drop the now-unused check_close harness helper.

docs/geospatial.md notes that every query round-trips its answer back to an
xarray.Dataset.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9
@alxmrs

alxmrs commented Jun 23, 2026

Copy link
Copy Markdown
Owner Author

Round-2 feedback addressed in e1cbc8b (cases 02/03 explicitly, plus the same treatment swept across the suite).

Your comments on 02 / 03:

  • Inline the open/slice; the helper hides it — deleted _era5.py. Each case now shows the fluent chain inline, e.g.
    ds = (
        xr.open_zarr(_URL, chunks=None, storage_options={"token": "anon"})[["2m_temperature"]]
        .sel(time=_TIME, latitude=_LAT, longitude=_LON)
        .load()
    )
    and registration is inline too (03/06 show the full from_dataset(..., table_names=...)).
  • ORDER BY latitude DESC, longitude, … — added to the climatology (… , hour) and anomaly queries.
  • Time is already hourly — right; the docstring now states ERA5 is hourly, so grouping by hour-of-day gives a clean 24-bin diurnal cycle (one sample per day in the window).
  • Load the climatology to a Dataset — the result now round-trips via to_dataset(dims=["latitude","longitude","hour"]) and is returned/used as a gridded xr.Dataset.
  • Keep the reference in pure xarray (no to_dataframe) — done: ds["2m_temperature"].groupby("time.hour").mean("time"), compared by coordinate label (no merge, no sortby).
  • to_pandas() copy in zonal mean — gone; it returns to_dataset(dims=["latitude"]).

Swept across the suite (anticipating the same notes):

  • All gridded results (NDVI, climatology, zonal mean, anomaly, forecast RMSE-by-lead, reprojection, regrid) now round-trip to xr.Dataset/DataArray and are compared with a new assert_grid_close helper that aligns by coordinate label via xr.testing.assert_allclose — so no case sorts manually.
  • Forecast skill returns each model's skill curve as a DataArray over lead (a real timedelta coordinate), with an xarray reference.
  • Dropped the now-unused check_close harness helper.

Full suite re-run is green (NDVI 1.05M px; climatology 535,704 cells; anomaly 1.6M cells; GraphCast still beats Pangu at every lead). Background note: I evaluated xarray-eopf for case 01's open but it pins zarr<3 (conflicts with this project), so case 01 stays on pystac-client + xr.open_datatree.

Thanks for the careful review — the cases read much more like idiomatic xarray now.

Comment thread benchmarks/geospatial/02_climatology.py Outdated
_URL, chunks=None, storage_options={"token": "anon"}
)[["2m_temperature"]]
.sel(time=_TIME, latitude=_LAT, longitude=_LON)
.load()

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should omit the load calls to test out the lazy property of xql. What do you think?

Comment thread benchmarks/geospatial/04_anomaly.py Outdated
ds = (
xr.open_zarr(
_URL, chunks=None, storage_options={"token": "anon"}
)[["2m_temperature"]]

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this. Xarray-SQL with have a pushdown opt to only select columns we care about.

# ///
"""Forecast skill — scoring ML weather models against ERA5 is a JOIN + aggregate.

This is the real thing: scoring the **Pangu-Weather** and **GraphCast** machine-

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like "this is that real thing"

Snaps lat/lon onto ERA5's exact coordinate arrays (identical 64×32 grid) so
the equality JOIN on coordinates is bit-safe across the two Zarr stores.
"""
da = _open(url)[_VAR].sel(time=_INIT)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at can omit the var selection here too, and elsewhere.

return da.to_dataset().load()


def _rmse_sql(table: str) -> str:

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is vulnerable to a SQL injection IIUC. Can we demo this in a way that puts forth best SQL practices?

era5_full = _open(_ERA5)

# Load the forecasts first; snap their grid to ERA5's exact coordinates.
with timed("read Pangu + GraphCast forecasts"):

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't load up front, then I don't think these timed blocks for measuring loading are necessary. This is good, it demos laziness.

…-3 review

Addresses round-3 PR review (cases 04, 05) and extends the patterns PR-wide:

- Demonstrate laziness + pushdown: ERA5 cases (02,03,04,06) now register the
  whole archive lazily (no .load(), no variable pre-selection) and let
  projection pushdown read only 2m_temperature and partition pruning read only
  the queried window. References use lazy .sel(); the up-front "read into
  memory" steps are gone.
- Parameterized queries (no SQL-string interpolation of values): time/region
  bounds are passed via param_values rather than f-string-formatted into the
  SQL. Verified pruning still fires with bound parameters.
- Forecast skill (05): stack the two models along a `model` dimension into one
  forecast table and GROUP BY the model column, eliminating the `FROM {table}`
  identifier interpolation (the injection smell) entirely. The 2m_temperature
  selection stays because the models store different pressure-level sets
  (Pangu 13, GraphCast 37) and must be aligned on the common surface field.
  Reword the "this is the real thing" docstring opening.

docs/geospatial.md and the suite README updated to describe lazy registration,
parameterized filters, and the model-column forecast query.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9
@alxmrs

alxmrs commented Jun 24, 2026

Copy link
Copy Markdown
Owner Author

Round-3 feedback addressed in a971f2b (cases 04/05 explicitly, extended across the suite).

Your comments:

  • 04:54 / 05:90 — don't pre-select the variable; pushdown handles columns. The ERA5 cases (02, 03, 04, 06) now register the whole archive lazily with no variable selection; SELECT "2m_temperature" FROM era5.surface lets projection pushdown read just that column. (05 is the one exception, see below.)
  • 05:136 — don't load up front; demo laziness. Nothing is .load()-ed up front anymore — datasets are registered lazily and the query pulls only the partitions/columns it needs at read time. The "read into memory" timed blocks are gone; the references use lazy .sel().
  • 05:97 — FROM {table} is a SQL-injection smell. Eliminated the dynamic identifier entirely: the two models are stacked along a model dimension into one forecast table, and the query does GROUP BY f.model, f.prediction_timedelta — no table name formatted into SQL. More broadly, all value filters (dates, region bounds) are now passed as bound query parameters (param_values=...) instead of f-string interpolation. I verified partition pruning still fires with bound parameters (a parameterized one-day WHERE reads one day, not the 1.3M-step archive).
  • 05:14 — "this is the real thing". Reworded.

One deliberate exception: 05 still selects 2m_temperature in xarray. Pangu stores 13 pressure levels and GraphCast stores 37, so the full datasets can't be concatenated over model; selecting the common surface field is semantically required to align them, not just an optimization. Noted in a code comment.

Full suite re-run is green (NDVI 1.05M px; climatology 535,704 cells; anomaly 1.6M cells; Sahara 33 °C vs Greenland −8 °C; GraphCast still beats Pangu at every lead). The lazy cases are slower now (each ERA5 case reads its window for the SQL and again for the reference), which is the expected cost of demonstrating laziness rather than pre-loading.

Background note (carried over): xarray-eopf's eopf-zarr engine for case 01 pins zarr<3, conflicting with this project, so case 01 stays on pystac-client + xr.open_datatree.

@alxmrs alxmrs left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few last thoughts. The geospatial docs are excellent.

Comment thread README.md Outdated
Comment on lines +111 to +116
For worked examples of common geospatial operations expressed in SQL — NDVI,
climatology, anomalies, zonal statistics, forecast skill (Pangu-Weather and
GraphCast vs ERA5), reprojection, and regridding, each checked against an xarray
reference — see [`benchmarks/geospatial/`](benchmarks/geospatial/) and the
[Geospatial operations are relational operations](docs/geospatial.md) write-up.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this paragraph should live in a new section below called "does it work?" and be framed to answer that question. Then add can close this issue: #46

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "does it work" should be between "how does this work" and "why does this work".

interpolation — which is case 08, where regridding turns out to be a JOIN
against a weight table rather than a scalar UDF.

**An honest caveat on threading:** PROJ's transformation context is not

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omit the word "honest".

alxmrs and others added 2 commits June 24, 2026 09:14
…real SRTM regrid)

- 07 reprojection: open a UTM grid through Xee carrying ee.Image.pixelLonLat(),
  so Earth Engine's own geodesy reports each pixel's true lon/lat. The SQL PROJ
  UDF is now validated against an *independent* implementation (EE), not against
  PROJ via pyproj. Grid built with xee.helpers.fit_geometry; compared at ~1e-5°.
- 08 regridding: regrid real SRTM elevation (USGS/SRTMGL1_003, Sierra Nevada)
  opened via Xee, validated against xarray's bilinear .interp() on the same
  source field — a new (DEM) data type for the suite.
- Both require Earth Engine access (earthengine authenticate + EARTHENGINE_PROJECT)
  and skip cleanly when EE is not initialized, like the other network cases.
  Deps add xee, earthengine-api, shapely.

Verified offline: imports, the graceful EE-not-initialized skip, ruff/mypy, and
the xee open_dataset / fit_geometry API surface. The live EE round-trip was not
run here (no credentials in this environment).

docs/geospatial.md and the suite README updated for the EE/Xee sources.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9
- Move the geospatial-suite pointer out of "What is this?" into a new
  "Does it work?" section, placed between "How does it work?" and "Why does
  this work?", reframed to answer that question with a findings summary
  (validated operations, real datasets, the GraphCast-beats-Pangu result).
  Addresses #46.
- Reword the two remaining uses of "honest" in docs/geospatial.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9
@alxmrs

alxmrs commented Jun 24, 2026

Copy link
Copy Markdown
Owner Author

Round-4 feedback addressed in df67d98:

  • README:116 — "Does it work?" section. Moved the geospatial-suite pointer out of "What is this?" into a new ## Does it work? section, placed between "How does it work?" and "Why does this work?", and reframed it to answer that question — a findings summary (which operations were validated, on which real datasets, and the headline GraphCast-beats-Pangu result). This addresses Add a "Does it work?" section to the README #46; I'll leave the issue for you to close on merge.
  • 07:34 — drop "honest". The 07 docstring already lost that wording in the Xee rewrite; I also reworded the two remaining uses ("Two honest caveats" → "Two caveats", "The honest boundary" → "The boundary") in docs/geospatial.md. No "honest" left in the suite or docs.

README section order is now: How does it work? → Does it work? → Why does this work?

alxmrs and others added 6 commits June 24, 2026 10:19
… both pass live

- Add _harness.initialize_earth_engine(): initialize EE from Application
  Default Credentials (gcloud auth application-default login) with the Earth
  Engine scope, using the ADC/EARTHENGINE_PROJECT project. This avoids the
  separate `earthengine authenticate` OAuth flow (and the "this app is blocked"
  org-policy error), and needs no service account.
- 07/08 use the helper instead of bare ee.Initialize.
- 07: Xee names the projected grid coords x/y (not X/Y), so drop the erroneous
  rename.

Verified end-to-end against Earth Engine:
- 07 reprojection: PROJ UDF matches EE's own ee.Image.pixelLonLat() to ~1e-5°
  over 2,565 UTM pixels (independent geodesy engines agree; pixel centers align).
- 08 regrid: real SRTM elevation (Sierra Nevada, 1206–3439 m) regridded
  50×50 → 60×72 matches xarray bilinear .interp() exactly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9
…xarray-sql

- Resolve the script's own directory (BASH_SOURCE) so it works from any cwd —
  the old `uv run ./01_nvdi.py` only worked from benchmarks/geospatial/ (and had
  a typo: 01_nvdi → 01_ndvi).
- Glob all NN_*.py cases instead of an incomplete hand-written list; report a
  per-case PASS/FAIL summary and exit non-zero if any case fails.
- Hand `uv` the local checkout via `--with-editable $REPO_ROOT`: these cases use
  xarray-sql features (XarrayDataFrame.to_dataset) newer than the latest PyPI
  release, so plain `uv run` would resolve an xarray-sql without `to_dataset`.
- 01_ndvi.py: declare aiohttp + requests (needed by pystac-client / zarr-over-
  HTTPS under uv's isolated environment).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9
The latest PyPI xarray-sql predates XarrayDataFrame.to_dataset, so plain
`uv run <script>` resolved a build without it. Each script now declares

    [tool.uv.sources]
    xarray-sql = { path = "../../", editable = true }

in its PEP 723 metadata. uv resolves this path relative to the *script*, so
`uv run <script>` uses the in-repo build from any working directory. run_all.sh
drops the `--with-editable` workaround and is back to plain `uv run`.

Verified `uv run 08_regrid_weights.py` from /tmp uses the local xarray_sql (with
to_dataset) and passes. Once a release ships these features, the [tool.uv.sources]
block can be removed for standalone PyPI use.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9
Add _harness.show_result(), and call it in every case right after
assert_grid_close passes, so a run shows *what* it computed — the gridded answer
round-tripped back out of SQL as an xarray Dataset/DataArray, via the standard
repr. Drop the now-redundant "<case> Dataset: {sizes}" one-liners in 02/04
(the repr already shows sizes).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9
Revise the prose across the suite for the future reader/user — someone from the
SQL or Xarray world — rather than for the PR conversation:

- Frame features by what that audience values (lazy, idiomatic) instead of the
  mechanics that justified review decisions ("nothing loaded or column-selected
  up front", "bounds passed as query parameters, not formatted into the SQL",
  "no table name formatted into the SQL").
- Fix stale claims: the suite compares with xarray (not `numpy.assert_allclose`),
  Earth Engine uses your `gcloud` login (not `earthengine authenticate`), and the
  run instructions cover `run_all.sh` and the in-repo build.
- Trim jargon from runtime labels and tighten a couple of code comments.

Touches README ("Does it work?"), docs/geospatial.md, the suite README, and the
02–06 docstrings/labels. No code behavior changes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9
Add a "Where this list comes from" section linking the Large Scale Geospatial
Benchmarks discussion (coiled/benchmarks #1545, opened by James Bourbeau, 2024)
and explaining what it asked: the end-to-end geospatial/climate workflows the
Xarray/Dask ecosystem needs at the 100-TB scale. A table maps the discussion's
seven workflow categories to the cases here — the suite covers six of the seven
(all but vector polygon-to-polygon spatial joins) — underscoring that the
operations are an independent community survey, not a SQL-friendly cherry-pick.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9
@ahuang11

Copy link
Copy Markdown
Collaborator

Skimmed the PR; wondering if the benchmark timings can be documented (e.g. how long it took on the last run)

alxmrs and others added 2 commits June 24, 2026 19:29
With GEOBENCH_PROFILE set, run_case runs a case's main() a warmup plus
GEOBENCH_REPS times, suppressing the repeated output, and writes one row of
summary stats (min/median/mean/stdev/max seconds + peak MB) per timed step to
GEOBENCH_CSV — a single shared table across all cases. Because each main() is
self-contained (builds its own context, reads its own data), re-running it
samples the operations with no change to the case scripts: all eight remain
byte-identical, and runs without the flag are unchanged.

Documented in the suite README (Profiling section). Verified end-to-end on case
08 (normal run unchanged; profiled run emits a clean per-step summary and CSV).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9
…ain()

Cleaner than the previous approach (which re-ran each case's main() and
suppressed its output): a repeatable step is now wrapped in
`for _ in measured(label): ...` — one line different from `with timed(...)`,
body unchanged. Without profiling it runs once; under GEOBENCH_PROFILE it runs a
warmup plus GEOBENCH_REPS measured passes and writes one summary row per step to
GEOBENCH_CSV. The harness no longer re-runs main() or redirects stdout, and the
perf table now carries both the SQL query and the xarray reference per case (a
built-in side-by-side). Registration/opens stay in `with timed(...)`.

The case bodies remain idiomatic xarray/SQL; only the SQL-query and reference
blocks gained the measured() wrapper. Verified normal runs are unchanged and
profiled runs of 08 (EE) and 05 (WB2 JOIN) emit clean summaries + CSV.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9

@ghostiee-11 ghostiee-11 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read through the suite end to end, really like the relational framing and that every case round-trips back to xarray and asserts against a reference. Two small things on the harness and one idea below.

Comment thread benchmarks/geospatial/_harness.py
Comment thread benchmarks/geospatial/_harness.py Outdated
Comment on lines +187 to +191
t0 = time.perf_counter()
yield
elapsed = time.perf_counter() - t0
_, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if the body raises, we skip tracemalloc.stop() and leave it running. small try/finally like timed():

Suggested change
t0 = time.perf_counter()
yield
elapsed = time.perf_counter() - t0
_, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
t0 = time.perf_counter()
try:
yield
finally:
elapsed = time.perf_counter() - t0
_, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()

Comment thread docs/geospatial.md

@alxmrs alxmrs left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest round of feedback

Comment thread benchmarks/geospatial/01_ndvi.py Outdated
r10m[["b04", "b08"]]
.rename(b04="red", b08="nir")
.isel(y=slice(_Y0, _Y0 + _N), x=slice(_X0, _X0 + _N))
.load()

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have this load op here still?

# decode_timedelta=True: forecasts store prediction_timedelta as a
# real duration (and it silences xarray's decode-timedelta warning).
return xr.open_zarr(
url, chunks=None, storage_options=_SO, decode_timedelta=True

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this storage options value have to be extracted into a constant if we are already extracting the open call to a function?

Comment on lines +199 to +203
got = xr.DataArray(
flat.values.reshape(len(tlat), len(tlon)),
dims=["lat", "lon"],
coords={"lat": tlat, "lon": tlon},
)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we need to reorganize the script as a data array? Why do we do this copy? Isn't the output + da access enough?

@ghostiee-11

Copy link
Copy Markdown
Contributor

built the lazy round-trip case I mentioned and opened it against this branch as #178 (also carries the two harness tweaks). same slab is 1,325 rows / ~0.7 MB lazy vs 3.86M / ~161 MB eager, asserted equal.

@ghostiee-11

ghostiee-11 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

@alxmrs a few cases I'd like to add next, all pulled from the #1545 workflow list so they're community-asked rather than SQL-friendly cherry-picks:

  • TEM diagnostic: zonal mean + eddy covariances (U'V', V'T', U'W'), which is just GROUP BY (time, level, lat) plus AVG(x*y) - AVG(x)*AVG(y). dcherian posted the exact xarray for this in #1545; validates against aostools/pyzome on ERA5.
  • forecast skill, ACC + MAE/bias: extends case 05 with anomaly correlation, which needs a JOIN to a climatology table to form the anomalies. reuses the gs://weatherbench2 data already in the suite.
  • conservative region aggregation: raster onto polygons as a sparse weight-table JOIN + weighted GROUP BY (the carbonplan/extreme-heat workflow). same shape as the regrid case, different weights.
  • building-footprint spatial join: bbox range-join + ST_Intersects refinement on Overture. the one #1545 category the suite doesn't cover yet, though it needs spatial UDFs since DataFusion has none natively.
  • spectral-index time series (NDWI/NDSI): scales the NDVI case from one scene to a GROUP BY over time on Sentinel-2.

alxmrs and others added 2 commits June 25, 2026 00:15
- assert_grid_close: fail when the SQL result misses grid cells. reindex_like
  aligned a short result onto the reference and silently passed; now a per-dim
  size mismatch raises, since the suite's whole point is "same numbers".
- measured(): wrap the profiled yield in try/finally so tracemalloc.stop()
  always runs even if the body raises.
- 01 NDVI: register the scene lazily (drop the eager .load()); the xarray
  reference now .compute()s in its block so its perf row measures the same
  read-and-compute the SQL side does, not just graph construction.
- 05: inline storage_options={"token": "anon"} into the single _open() that
  uses it; drop the _SO module constant.
- 08 regrid: name each target cell by its (dst_lat, dst_lon) in the weight
  table and GROUP BY those, so the result round-trips straight to a (lat, lon)
  grid via to_dataset — no manual reshape/copy into a new DataArray.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9
The in-process warm loop was unfair: xr.open_zarr(chunks=None) caches each
variable in memory after the first read, so the xarray reference served its
later repetitions from RAM while the SQL side re-read the store every query —
inflating the gap (case 05 read as 23.5s SQL vs 0.29s reference; cold-vs-cold
it's ~0.8s vs ~0.6s).

run_perf.sh runs each case once per fresh process with no warmup, repeated
GEOBENCH_REPS times, so both the SQL operation and the xarray reference pay a
cold read on every measurement. perf_summary.py aggregates the independent cold
runs into a per-step table. The cases and the measured() harness are unchanged;
fairness comes from process isolation, not in-process repetition.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9
alxmrs and others added 2 commits June 25, 2026 13:38
Add Results, Analysis, and Conclusion sections to docs/geospatial.md.

Results publishes the cold-vs-cold timings (run_perf.sh on an e2-standard-8
in-region), with the methodology note on why a fresh process per repetition is
the only fair comparison.

Analysis decomposes case 05 into read vs compute vs round-trip (cProfile): the
read is comparable on both sides, the ~30x gap is compute (explode to rows, hash
JOIN, aggregate vs a vectorized NumPy reduction), and the SQL->xarray round-trip
is sub-millisecond. It explains the table's shape — JOINs widest, GROUP BYs
narrowest — and why the ratio shifts with hardware.

Conclusion states the tradeoff plainly: arrays for dense, grid-aligned numerics;
SQL for relationally shaped work and SQL-fluent audiences; and the distributed
query-engine payoff the single-node benchmark can only point toward.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9
@@ -0,0 +1,113 @@
#!/usr/bin/env python3
"""Aggregate the cold-run perf CSV into a per-step summary and a markdown table.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not check in this script.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — removed perf_summary.py and folded its aggregation into run_perf.sh as an inline step, so the perf tooling is one self-contained driver (0941e0b).

Comment thread README.md Outdated

* **Spectral indices** (NDVI) — column arithmetic over a real Sentinel-2 scene.
* **Climatology, anomalies, zonal means** — `GROUP BY` and self-`JOIN` over the
full 0.25° **ARCO-ERA5** archive (≈1.3M hourly steps), read lazily so each

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really this many hourly steps?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we really do this broad of a query?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — I dropped the “≈1.3M hourly steps” figure; it overstated what the queries touch. Reworded to say the archive is multi-decade and registered as a lazy table, while each query reads only its small window (0941e0b).

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right — the queries are bounded to a small window (a few days over a region), not a scan of the archive. Reworded so the README says exactly that (0941e0b).

Comment thread benchmarks/geospatial/02_climatology.py Outdated
``da.groupby("time.hour").mean()``. ERA5 is hourly, so grouping by hour of day
gives a clean 24-bin **diurnal cycle**, one sample per day in the window.

The table is the *whole* ARCO-ERA5 archive, opened lazily: the query reads only

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't appear to be the whole ERA5 archive, because we have a time range. What is the truth?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Reworded: we register the whole archive as a lazy table, but this climatology is computed over a bounded window — the WHERE prunes the read, so the query never scans the full record (0941e0b).

Comment thread benchmarks/geospatial/02_climatology.py
Comment thread benchmarks/geospatial/04_anomaly.py Outdated
FROM era5 a JOIN clim c
ON (a.latitude, a.longitude, hour(a.time)) = (c.latitude, c.longitude, c.hour)

The table is the *whole* ARCO-ERA5 archive, opened lazily: both the climatology

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the whole ERA5 archive.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded the same way — the table spans the archive lazily; the anomaly is computed over a bounded window the WHERE prunes to (0941e0b).

Comment thread docs/geospatial.md Outdated
horizon" curve (≈0.3 K at 6 h rising to ≈2.5 K at 9 days):

```
lead (days) Pangu GraphCast

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use the dataframe code above, then let's print that repr here.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — the doc now shows the got.to_pandas() repr (RMSE by lead × model), matching the script's new headline (1cd2c93).

Comment thread docs/geospatial.md
Comment thread docs/geospatial.md Outdated
Comment thread docs/geospatial.md
Comment thread docs/geospatial.md Outdated
relational, and that for a surprising share of geoscience, the operation is
relational.

## Running the suite

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put this section above results.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — “Running the suite” now sits above Results (1cd2c93).

alxmrs and others added 3 commits June 25, 2026 18:41
… fixes

Fairness: case 05's reference used `.load()`, which caches data in place on the
forecasts/truth objects the SQL table also reads from — so running the reference
after the SQL query (as the harness does) could serve it a warm read. Switch to
`.compute()`, which returns a fresh array and leaves the inputs lazy. Verified:
reading a window repeatedly in one process stays flat, and neither side warms the
other. The other cases either reopen their data (06) or recompute eagerly with
`chunks=None` (02-04), so they were already cold; only 05 leaked.

Case 05: render the headline RMSE-by-lead table with `to_pandas()` instead of a
hand-rolled print loop; clarify that the `chunks=` arg is the Arrow batch size,
not a filter (no data dropped).

Case 06: the docstring claimed "full ARCO-ERA5" as the dataset; the query
aggregates one day's window (the WHERE prunes the read). Make that precise.

Case 07: drop the "graduate into the package" paragraph; clarify that there is
one dataset (UTM x/y as SQL input, EE's pixelLonLat as the independent
reference), not the same image opened twice in two CRS.

Docs: correct the profiling methodology note (the caching trap is in-place
`.load()`, not `open_zarr` auto-caching); point the "earns its keep" section
forward to Results/Analysis/Conclusion; move "Running the suite" above Results;
link James Bourbeau's profile; show the to_pandas headline repr.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9
The ARCO-ERA5 cases (02, 04, README) said the query runs over the *whole* archive;
each query is in fact bounded by its WHERE window. Reword to say the table spans
the archive (lazy) while the query reads only its slice — and drop the unverified
"~1.3M hourly steps" framing from the README.

Fold perf_summary.py into run_perf.sh (an inline aggregation step) and remove the
separate script, so the perf tooling is one self-contained driver.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9
Fresh in-region e2-standard-8 run of the full suite. The key change is case 05's
reference: now genuinely cold (~0.34s, stable across reps) instead of the ~0.22s
warmed number — confirming the .load()->.compute() fix closed the leak. Update the
Results table (cases 01-06), note case 05's CPU/hardware sensitivity (~12s vs ~23s
across two VMs), and correct the NDVI ratio (~8x).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9
Comment thread benchmarks/geospatial/05_forecast_skill.py
Comment on lines +144 to +145
ctx.from_dataset("forecasts", forecasts, chunks={"time": 6})
ctx.from_dataset("era5", truth, chunks={"time": 100})

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense to me. This ERA5 dataset is a 6 hourly step version at a small grid designed to be comparable to the forecasts. By throw away data, I mean we load 100 timesteps in the chunk and then discard the remainder that doesn't match the forecast chunk (as they divide together).

Can you write a small prototype (that's discarded) that tries 6 and 6 chunks ve 6 and 100 chunks to see what the empirical answer is? I'm still not convinced.

"2m_temperature"
]
.sel(time=_DAY)
.load()

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're not using dask anyway, let's get rid of this load. (Is this essentially for any reason?)

Comment thread benchmarks/geospatial/07_reproject_udf.py
rename[d] = "lat"
elif dl in ("x", "lon", "longitude"):
rename[d] = "lon"
return da.rename(rename).sortby("lat").sortby("lon").load()

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want this to be a valid perf comparison test of laziness for the xr ref too, so let's get rid of the load here.

Comment thread benchmarks/geospatial/run_perf.sh Outdated
# Aggregate the per-process cold runs (one row each) into a per-step summary and
# a markdown table. Each raw row is one cold sample (reps=1), so we collect the
# samples per (case, title, step) and report median/spread/peak across them.
python3 - "$RAW" "$SUMMARY" <<'PY'

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, fine, let's make this a second script.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — moved the aggregation back into perf_summary.py as a second script (the inline heredoc was worse to read) (7bfd56f).

alxmrs and others added 5 commits June 25, 2026 23:03
Case 06: drop the reference's `.load()`. The day's field is now read inside the
timed block (like the SQL side) via a single masked reduction over stacked region
masks — one lazy read for all regions, instead of pre-materializing. Verified
against the SQL result.

perf: move the aggregation back out of run_perf.sh into perf_summary.py (a second
script reads cleaner than the inline heredoc).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9
Previously the SQL side read SRTM into an in-memory (cell_id, value) table while
the reference read lazily — asymmetric. Now both read the source lazily: register
the Xee field itself as the SQL `src` table and key the weight join on the source
coordinates (src_lat, src_lon) instead of a pre-raveled cell_id, and drop the
`.load()` in `_open_srtm`. Coords are forced to float64 so the join matches the
source grid exactly.

The regrid math (the new weight table + coordinate join) is validated against
xarray `.interp` on a synthetic source to ~1e-12; the Earth Engine end-to-end run
still needs confirmation in an EE-authenticated environment.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9
…ding

A 2×2 sweep of (forecasts, era5) time-chunks ∈ {6,100} shows the truth/era5 chunk
is what matters — small splits it into many partitions and costs ~3× — while the
forecasts chunk is in the noise and a mismatch costs nothing. So set both to one
partition (time:100) for these small windows, and document why.

Also: cases 07 and 08 now validated end-to-end against real Earth Engine (08's
symmetric-lazy regrid matches xarray .interp over 4,320 target cells; 07's PROJ
UDF matches EE pixelLonLat).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9
Case 09 composes 07 (the reproject UDF) and 08 (the regrid weight-table
JOIN) into a full warp — GDAL/rasterio `reproject`: change the CRS *and*
resample onto the new grid. The 07 UDF carries the target lon/lat grid
back into source UTM space, arrays turn those reprojected points into
bilinear weights, and the 08 JOIN applies them. Validated against
xarray `.interp()` at the reprojected points (exact), with Earth
Engine's own lon/lat SRTM as a cross-CRS sanity check.

Wires 09 into the README case table + dataset note, the docs mapping
table and the "hard cases" narrative, and the run_perf.sh glob.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9
Re-ran the whole suite (01-09) on a single e2-standard-8 in us-central1,
with Earth Engine reached from the same VM, so every case shares one
machine and one release build. Updates the Results table and the
analysis to match:

- Case 05 came in at 10.7 s (not 23 s) — the hardware-sensitivity caveat
  borne out; 23 s was the high outlier across three runs.
- SQL is no longer slower in *every* case: 08 is at parity (0.875 vs
  0.850 s) and 09 is faster (0.281 vs 0.817 s), because the geometry (the
  weights) is precomputed and the relational half just applies it. The
  analysis and conclusion now reflect this instead of a blanket "slower."
- Case 08 is no longer the stale eager-load number; both sides read SRTM
  lazily, so the comparison is symmetric. Adds the new case 09 row.
- Marks the cProfile decomposition explicitly as a separate laptop
  profile (the intended hardware contrast), resolving the apparent
  total-vs-table mismatch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AWvrZYAT2NbuETBqNAN3o9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants