Skip to content

Commit d6cf4bf

Browse files
authored
Add user guide section on debugging slow access (#53)
* Add user guide section on debugging slow access * Update TOC
1 parent 04bb5af commit d6cf4bf

2 files changed

Lines changed: 46 additions & 1 deletion

File tree

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Debugging Slow Data Access
2+
3+
This guide shows how to understand what network requests your code is making when data access is slower than expected.
4+
5+
## Tracing Xarray Operations
6+
7+
Wrap your store with [`TracingReadableStore`][obspec_utils.wrappers.TracingReadableStore] to see what requests are made when opening a dataset:
8+
9+
```python exec="on" source="above" session="trace" result="code"
10+
import xarray as xr
11+
from obstore.store import HTTPStore
12+
from obspec_utils.wrappers import TracingReadableStore, RequestTrace
13+
from obspec_utils.readers import EagerStoreReader
14+
15+
# Access sample NetCDF files over HTTP
16+
store = HTTPStore.from_url("https://raw.githubusercontent.com/pydata/xarray-data/refs/heads/master/")
17+
18+
trace = RequestTrace()
19+
traced_store = TracingReadableStore(store, trace)
20+
21+
path = "air_temperature.nc"
22+
23+
with EagerStoreReader(traced_store, path) as reader:
24+
ds = xr.open_dataset(reader, engine="scipy")
25+
var_names = list(ds.data_vars)
26+
27+
summary = trace.summary()
28+
print(f"Opening dataset required:")
29+
print(f" {summary['total_requests']} request(s)")
30+
print(f" {summary['total_bytes'] / 1e6:.2f} MB transferred")
31+
print(f"Variables found: {var_names}")
32+
```
33+
34+
The [`RequestTrace`][obspec_utils.wrappers.RequestTrace] collects information about each request, including byte ranges, timing, and request method. Use [`summary()`][obspec_utils.wrappers.RequestTrace.summary] for quick statistics or access individual [`RequestRecord`][obspec_utils.wrappers.RequestRecord] objects via `trace.requests`.
35+
36+
## Common Patterns to Look For
37+
38+
When analyzing traces, watch for:
39+
40+
| Pattern | Symptom | Solution |
41+
|---------|---------|----------|
42+
| Many small requests | High request count, low bytes per request | Use [`EagerStoreReader`][obspec_utils.readers.EagerStoreReader] to fetch full file or [`BlockStoreReader`][obspec_utils.readers.BlockStoreReader] to fetch and cache larger blocks |
43+
| Duplicate requests | Same file/range requested multiple times | Add [`CachingReadableStore`][obspec_utils.wrappers.CachingReadableStore] |
44+
| Sequential tiny reads | Many requests with incrementing offsets | Increase buffer size or use eager loading |

mkdocs.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,8 @@ nav:
1616
- "index.md"
1717
- "User Guide":
1818
- "Opening Data with Xarray": "user-guide/opening-data-with-xarray.md"
19-
- "Finding files on cloud object storage": "user-guide/finding-files.md"
19+
- "Finding Files on the Cloud": "user-guide/finding-files.md"
20+
- "Debugging Slow Data Access": "user-guide/debugging-data-access.md"
2021
- "API":
2122
- Glob: "api/glob.md"
2223
- Protocols: "api/protocols.md"

0 commit comments

Comments
 (0)