Add lazy round-trip benchmark (case 09)#178
Conversation
Add benchmarks/geospatial/09_lazy_roundtrip.py showing the SQL to xarray round-trip (to_dataset) is lazy: a chunked to_dataset plus .sel(time=t0) pushes a single WHERE into SQL (1,325 vs 3,869,000 rows; ~0.7 vs ~161 MB) and asserts equal to the xarray reference. Also harden _harness.py: assert_grid_close fails on a partial grid, and measured() stops tracemalloc in a finally. Document case 09 in the suite README.
| # | ||
| # [tool.uv.sources] | ||
| # xarray-sql = { path = "../../", editable = true } | ||
| # /// |
There was a problem hiding this comment.
I see this as a good unit test or property that cross cuts all the other geo benchmarks, but I don't think it alone makes for a good benchmark example.
There was a problem hiding this comment.
Okay, makes sense
alxmrs
left a comment
There was a problem hiding this comment.
A few other notes. If you removed case 09 and we got to the bottom of the fixes, I'd be happy to merge this.
| t0 = time.perf_counter() | ||
| yield | ||
| elapsed = time.perf_counter() - t0 | ||
| _, peak = tracemalloc.get_traced_memory() | ||
| tracemalloc.stop() | ||
| try: | ||
| yield | ||
| finally: | ||
| elapsed = time.perf_counter() - t0 | ||
| _, peak = tracemalloc.get_traced_memory() | ||
| tracemalloc.stop() |
| short = { | ||
| d: (got.sizes[d], ref.sizes[d]) | ||
| for d in ref.dims | ||
| if d in got.sizes and got.sizes[d] != ref.sizes[d] | ||
| } | ||
| if short: | ||
| raise AssertionError( | ||
| f"{name}: SQL result does not cover the reference grid " | ||
| f"(dim: got vs ref = {short}); the comparison would be partial" | ||
| ) |
There was a problem hiding this comment.
I'm surprised that Xarray's all close doesn't cover this case. Are you sure this is necessary?
There was a problem hiding this comment.
good catch, you're right allclose would catch it on its own. it's the reindex_like(got) one line up that hides it: it shrinks ref down to got's coords first, so a result missing cells still passes on the subset.
got = ref.isel(lat=[0, 1, 2]) # 2 cells dropped
xr.testing.assert_allclose(got, ref.reindex_like(got)) # passes, silently
xr.testing.assert_allclose(got, ref) # raisesso the guard just restores the check reindex_like removes. could also drop reindex_like for xr.align(..., join="exact"), but that line handles label ordering so the guard felt smaller. either works.
For #177 , Adds
09_lazy_roundtrip.py: the sameair[t0]slab is 1,325 rows / ~0.7 MB via lazyto_dataset(chunks={"time":1})+.sel(time=t0)(one WHERE pushed down) vs 3.86M rows / ~161 MB eager, all asserted equal to the xarray reference.Also hardens
_harness.py(grid-coverage guard inassert_grid_close, tracemallocfinallyinmeasured); runs green locally via uv.