|
| 1 | +# Debugging Slow Data Access |
| 2 | + |
| 3 | +This guide shows how to understand what network requests your code is making when data access is slower than expected. |
| 4 | + |
| 5 | +## Tracing Xarray Operations |
| 6 | + |
| 7 | +Wrap your store with [`TracingReadableStore`][obspec_utils.wrappers.TracingReadableStore] to see what requests are made when opening a dataset: |
| 8 | + |
| 9 | +```python exec="on" source="above" session="trace" result="code" |
| 10 | +import xarray as xr |
| 11 | +from obstore.store import HTTPStore |
| 12 | +from obspec_utils.wrappers import TracingReadableStore, RequestTrace |
| 13 | +from obspec_utils.readers import EagerStoreReader |
| 14 | + |
| 15 | +# Access sample NetCDF files over HTTP |
| 16 | +store = HTTPStore.from_url("https://raw.githubusercontent.com/pydata/xarray-data/refs/heads/master/") |
| 17 | + |
| 18 | +trace = RequestTrace() |
| 19 | +traced_store = TracingReadableStore(store, trace) |
| 20 | + |
| 21 | +path = "air_temperature.nc" |
| 22 | + |
| 23 | +with EagerStoreReader(traced_store, path) as reader: |
| 24 | + ds = xr.open_dataset(reader, engine="scipy") |
| 25 | + var_names = list(ds.data_vars) |
| 26 | + |
| 27 | +summary = trace.summary() |
| 28 | +print(f"Opening dataset required:") |
| 29 | +print(f" {summary['total_requests']} request(s)") |
| 30 | +print(f" {summary['total_bytes'] / 1e6:.2f} MB transferred") |
| 31 | +print(f"Variables found: {var_names}") |
| 32 | +``` |
| 33 | + |
| 34 | +The [`RequestTrace`][obspec_utils.wrappers.RequestTrace] collects information about each request, including byte ranges, timing, and request method. Use [`summary()`][obspec_utils.wrappers.RequestTrace.summary] for quick statistics or access individual [`RequestRecord`][obspec_utils.wrappers.RequestRecord] objects via `trace.requests`. |
| 35 | + |
| 36 | +## Common Patterns to Look For |
| 37 | + |
| 38 | +When analyzing traces, watch for: |
| 39 | + |
| 40 | +| Pattern | Symptom | Solution | |
| 41 | +|---------|---------|----------| |
| 42 | +| Many small requests | High request count, low bytes per request | Use [`EagerStoreReader`][obspec_utils.readers.EagerStoreReader] to fetch full file or [`BlockStoreReader`][obspec_utils.readers.BlockStoreReader] to fetch and cache larger blocks | |
| 43 | +| Duplicate requests | Same file/range requested multiple times | Add [`CachingReadableStore`][obspec_utils.wrappers.CachingReadableStore] | |
| 44 | +| Sequential tiny reads | Many requests with incrementing offsets | Increase buffer size or use eager loading | |
0 commit comments