Skip to content

Commit eadc322

Browse files
authored
refactor: make ObjectStoreRegistry typing generic (#38)
* refactor: make ObjectStoreRegistry generic for flexible protocol composition Following obspec's philosophy of flat protocol composition, the registry is now generic with Get as the minimal bound. Callers can specify their exact protocol requirements via the type parameter. - ObjectStoreRegistry[T] with T bounded by Get - Readers define nested Store protocols for their specific needs - Wrappers share internal ReadableStore (not exported) Runtime: No breaking changes Typing: Code with explicit ObjectStoreRegistry type annotations may need to add a type parameter or use type: ignore * Use obspec object inventory * Improve design doc * Rename * Improve organization * Fix errors * Typing cross-references
1 parent c992769 commit eadc322

10 files changed

Lines changed: 447 additions & 133 deletions

File tree

File renamed without changes.

docs/design/protocols.md

Lines changed: 276 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,276 @@
1+
# Protocols and duck-typing when using obspec-utils
2+
3+
This guide describes obspec-util's philosophy around protocols and duck-typing and provides recommendations
4+
for downstream users, such as [VirtualiZarr](https://virtualizarr.readthedocs.io/en/stable/index.html) parsers.
5+
6+
## obspec's Recommended Approach
7+
8+
[obspec uses independent protocols](https://developmentseed.org/obspec/latest/blog/2025/06/25/introducing-obspec-a-python-protocol-for-interfacing-with-object-storage/#protocols-duck-typing-not-subclassing) rather than a monolithic interface. Obspec-util's adoption of that philosophy can be summarized as:
9+
10+
- **Compose flat, independent protocols** for each use case
11+
- **Don't force unnecessary capabilities** — requiring fewer operations means more backend compatibility
12+
- **Avoid hierarchical tiers** — they create artificial coupling between unrelated capabilities
13+
14+
The short summary for VirtualiZarr parsers is that we recommend each parser should define exactly the protocols it needs:
15+
16+
```python
17+
from typing import Protocol
18+
from obspec import Get, GetAsync, GetRange, GetRangeAsync, GetRanges, GetRangesAsync, Head, HeadAsync, List, ListAsync
19+
20+
# Kerchunk - truly minimal
21+
class KerchunkProtocol(Get, GetAsync, Protocol):
22+
"""Fetch whole objects only."""
23+
24+
# HDF5 - range requests + file size
25+
class HDF5Protocol(GetRange, GetRangeAsync, Head, HeadAsync, Protocol):
26+
"""Random access with metadata."""
27+
28+
# Zarr - enumeration + file size
29+
class ZarrProtocol(List, ListAsync, Head, HeadAsync, Protocol):
30+
"""Chunk discovery and size detection."""
31+
32+
# COG - parallel ranges + file size
33+
class COGProtocol(GetRange, GetRangeAsync, GetRanges, GetRangesAsync, Head, HeadAsync, Protocol):
34+
"""Parallel tile fetching."""
35+
```
36+
37+
## obspec-utils Internal Design
38+
39+
obspec-utils uses two patterns for protocol requirements:
40+
41+
### Readers: Nested `Store` Protocols
42+
43+
Each reader defines its own nested `Store` protocol with exactly what it needs:
44+
45+
```python
46+
class BufferedStoreReader:
47+
class Store(Get, GetRange, Protocol):
48+
"""Requires Get + GetRange."""
49+
pass
50+
51+
class EagerStoreReader:
52+
class Store(Get, GetRanges, Protocol):
53+
"""Requires Get + GetRanges (+ optional Head)."""
54+
pass
55+
```
56+
57+
### Wrappers: Internal `ReadableStore`
58+
59+
Transparent proxy wrappers (`CachingReadableStore`, `TracingReadableStore`, `SplittingReadableStore`) share an internal `ReadableStore` protocol since they all need the same full read interface:
60+
61+
```python
62+
# Internal to obspec-utils (not exported)
63+
class ReadableStore(Get, GetAsync, GetRange, GetRangeAsync, GetRanges, GetRangesAsync, Protocol):
64+
"""Full read interface for transparent store wrappers."""
65+
```
66+
67+
External consumers should compose their own protocols from obspec.
68+
69+
### Generic Registry Design
70+
71+
The registry is generic with [Get][obspec.Get] as the bound, allowing callers to specify their exact protocol requirements:
72+
73+
```python
74+
from typing import TypeVar, Generic
75+
from obspec import Get
76+
77+
T = TypeVar("T", bound=Get)
78+
79+
class ObjectStoreRegistry(Generic[T]):
80+
def __init__(self, stores: dict[Url, T] | None = None) -> None: ...
81+
def register(self, url: Url, store: T) -> None: ...
82+
def resolve(self, url: Url) -> tuple[T, Path]: ...
83+
```
84+
85+
Usage with parser-specific protocols:
86+
87+
```python
88+
# Zarr workflow
89+
registry: ObjectStoreRegistry[ZarrProtocol] = ObjectStoreRegistry({
90+
"s3://bucket": s3_store,
91+
})
92+
store, path = registry.resolve(url) # store: ZarrProtocol
93+
store.list(path) # OK
94+
store.head(path) # OK
95+
96+
# Kerchunk workflow - less restrictive
97+
registry: ObjectStoreRegistry[Get] = ObjectStoreRegistry({
98+
"https://cdn.example.com": http_store, # Only needs Get
99+
})
100+
```
101+
102+
### Why Not Protocol Tiers?
103+
104+
A tiered approach (`MinimalStore``ReadableStore``ListableStore`) creates artificial coupling:
105+
106+
| Tier approach | Problem |
107+
|---------------|---------|
108+
| `ReadableStore` bundles `GetRange` + `GetRanges` + `Head` | Some range readers don't need `Head` (size passed explicitly) |
109+
| `ReadableStore` requires `GetRanges` | Some backends only support single `GetRange` |
110+
| `ListableStore` requires all of `ReadableStore` | ZarrParser needs `List` + `Head`, not `GetRanges` |
111+
112+
Flat composition avoids these issues — each protocol includes only what's actually needed.
113+
114+
## Options for downstream users
115+
116+
### Runtime Validation
117+
118+
Since Protocol `isinstance()` checks are unreliable, parsers should validate at call time:
119+
120+
```python
121+
class ZarrParser:
122+
def __call__(self, url: str, registry: ObjectStoreRegistry) -> ManifestStore:
123+
store, _ = registry.resolve(url)
124+
if not (hasattr(store, "list") and hasattr(store, "head")):
125+
raise TypeError(
126+
f"ZarrParser requires List + Head protocols. "
127+
f"{type(store).__name__} is missing required methods."
128+
)
129+
# ... proceed
130+
```
131+
132+
We also recommend using static type checkers.
133+
134+
### Escape Hatches
135+
136+
Provide parameters to reduce requirements where desired:
137+
138+
```python
139+
class ZarrParser:
140+
def __init__(self, consolidated_metadata: dict | None = None):
141+
self.consolidated_metadata = consolidated_metadata # Skip List requirement
142+
143+
class HDF5Parser:
144+
def __init__(self, file_size: int | None = None):
145+
self.file_size = file_size # Skip Head requirement
146+
```
147+
148+
### Backwards Compatibility
149+
150+
**Can VirtualiZarr depend on obspec-utils without parser changes?**
151+
152+
At runtime, `resolve()` returns the actual store object (e.g., `S3Store`), which has all methods. Type hints only affect static analysis.
153+
154+
| Layer | Behavior | Parser changes needed? |
155+
|-------|----------|------------------------|
156+
| Runtime | Stores have all methods | No |
157+
| Static typing | Type checkers see declared protocol | Depends on approach |
158+
159+
#### Migration Path
160+
161+
1. **Immediate:** Duck typing — no changes, works at runtime, type checkers complain
162+
2. **Incremental:** Type-ignore pragmas — `store.list(path) # type: ignore[attr-defined]`
163+
3. **Full type safety:** Generic registry with parser-specific protocols
164+
165+
### VirtualiZarr Implementation Guide
166+
167+
VirtualiZarr parsers should define their protocol requirements in VirtualiZarr, not in obspec-utils. This keeps obspec-utils minimal and lets VirtualiZarr evolve its requirements independently.
168+
169+
#### Defining Parser Protocols
170+
171+
In `virtualizarr/parsers/protocols.py`:
172+
173+
```python
174+
from typing import Protocol
175+
from obspec import Get, GetAsync, GetRange, GetRangeAsync, Head, HeadAsync, List, ListAsync
176+
177+
class KerchunkStore(Get, GetAsync, Protocol):
178+
"""Store protocol for Kerchunk-based parsers (pre-indexed offsets)."""
179+
pass
180+
181+
class HDF5Store(GetRange, GetRangeAsync, Head, HeadAsync, Protocol):
182+
"""Store protocol for HDF5 parsing (random access + file size)."""
183+
pass
184+
185+
class ZarrStore(List, ListAsync, Head, HeadAsync, Protocol):
186+
"""Store protocol for Zarr parsing (chunk discovery + sizes)."""
187+
pass
188+
```
189+
190+
#### Using Protocols in Parsers
191+
192+
Each parser uses its protocol for type hints and validates at runtime:
193+
194+
```python
195+
# virtualizarr/parsers/zarr.py
196+
from typing import Protocol
197+
from obspec import List, ListAsync, Head, HeadAsync
198+
from obspec_utils import ObjectStoreRegistry
199+
200+
class ZarrStore(List, ListAsync, Head, HeadAsync, Protocol):
201+
"""Store protocol for Zarr parsing."""
202+
pass
203+
204+
class ZarrParser:
205+
def __call__(
206+
self,
207+
url: str,
208+
registry: ObjectStoreRegistry[ZarrStore],
209+
) -> ManifestStore:
210+
store, path = registry.resolve(url)
211+
212+
# Runtime validation with clear error message
213+
missing = []
214+
if not hasattr(store, "list"):
215+
missing.append("List")
216+
if not hasattr(store, "head"):
217+
missing.append("Head")
218+
if missing:
219+
raise TypeError(
220+
f"ZarrParser requires {', '.join(missing)} protocols. "
221+
f"{type(store).__name__} does not support these operations. "
222+
"Use S3Store, LocalStore, or another store with listing support."
223+
)
224+
225+
# Type checker knows store has list() and head()
226+
chunks = store.list(path)
227+
# ...
228+
```
229+
230+
#### Creating Typed Registries
231+
232+
Users create registries with the appropriate protocol for their workflow:
233+
234+
```python
235+
# For Zarr workflows
236+
from virtualizarr.parsers.protocols import ZarrStore
237+
238+
registry: ObjectStoreRegistry[ZarrStore] = ObjectStoreRegistry({
239+
"s3://my-bucket": S3Store(bucket="my-bucket"),
240+
})
241+
242+
# Type checker enforces that only ZarrStore-compatible stores are registered
243+
# and that resolved stores have list() and head() methods
244+
```
245+
246+
#### Nested Store Protocol Pattern
247+
248+
Following obspec-utils' reader pattern, parsers can define their protocol as a nested class:
249+
250+
```python
251+
class ZarrParser:
252+
class Store(List, ListAsync, Head, HeadAsync, Protocol):
253+
"""Store protocol required by ZarrParser."""
254+
pass
255+
256+
def __call__(
257+
self,
258+
url: str,
259+
registry: ObjectStoreRegistry["ZarrParser.Store"],
260+
) -> ManifestStore:
261+
# ...
262+
```
263+
264+
This is self-documenting — the protocol is defined alongside the parser that requires it.
265+
266+
## Summary
267+
268+
1. **Flat composition over tiers** — each consumer defines exactly the protocols it needs
269+
2. **Generic registry** with [Get][obspec.Get] bound
270+
3. **obspec-utils internal patterns:**
271+
- Readers use nested `Store` protocols (each with specific requirements)
272+
- Wrappers share internal `ReadableStore`
273+
4. **External consumers** (like VirtualiZarr) should compose protocols from obspec directly
274+
5. **Runtime validation** in parsers with clear error messages
275+
6. **Escape hatches** where feasible (`file_size`, `consolidated_metadata`)
276+
7. **Backwards compatible** — duck typing works immediately; generics for full type safety

mkdocs.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,9 @@ extra:
1414

1515
nav:
1616
- "index.md"
17-
- "Caching Architecture": "caching-architecture.md"
17+
- "Design":
18+
- "Protocols": "design/protocols.md"
19+
- "Caching": "design/caching.md"
1820
- "API":
1921
- Typing: "api/typing.md"
2022
- Aiohttp Store Adapters: "api/aiohttp.md"
@@ -93,6 +95,7 @@ plugins:
9395
inventories:
9496
- https://docs.python.org/3/objects.inv
9597
- https://developmentseed.org/obstore/latest/objects.inv
98+
- https://developmentseed.org/obspec/latest/objects.inv
9699

97100
# https://github.com/developmentseed/titiler/blob/50934c929cca2fa8d3c408d239015f8da429c6a8/docs/mkdocs.yml#L115-L140
98101
markdown_extensions:

src/obspec_utils/aiohttp.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ async def __aiter__(self) -> AsyncIterator[bytes]:
130130

131131
class AiohttpStore(ReadableStore):
132132
"""
133-
An aiohttp-based implementation of the ReadableStore protocol.
133+
An aiohttp-based object store implementation.
134134
135135
This provides a lightweight alternative to obstore's HTTPStore for generic
136136
HTTP/HTTPS access. It's particularly useful for:

src/obspec_utils/cache.py

Lines changed: 7 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
"""Caching utilities for obspec-utils.
22
3-
This module provides a caching wrapper for ReadableStore implementations,
3+
This module provides a caching wrapper for object stores,
44
useful for reducing network requests when files are accessed multiple times.
55
"""
66

@@ -25,21 +25,12 @@ class CachingReadableStore(ReadableStore):
2525
"""
2626
A wrapper that caches full objects in a MemoryStore on first access.
2727
28-
This wrapper implements the ReadableStore protocol and caches entire
29-
objects when they are first accessed. Subsequent accesses (including
30-
range requests) are served from the cache.
28+
This wrapper caches entire objects when they are first accessed.
29+
Subsequent accesses (including range requests) are served from the cache.
3130
3231
The cache uses LRU (Least Recently Used) eviction when it exceeds
3332
the maximum size.
3433
35-
Parameters
36-
----------
37-
store
38-
The underlying store to wrap.
39-
max_size
40-
Maximum cache size in bytes. When exceeded, least recently used
41-
entries are evicted. Default: 256 MB (256 * 1024 * 1024).
42-
4334
Notes
4435
-----
4536
**Thread Safety**: This class is thread-safe and works correctly with
@@ -94,7 +85,10 @@ def __init__(self, store: ReadableStore, max_size: int = 256 * 1024 * 1024) -> N
9485
Parameters
9586
----------
9687
store
97-
The underlying store to wrap.
88+
Any object implementing the full read interface: [Get][obspec.Get],
89+
[GetAsync][obspec.GetAsync], [GetRange][obspec.GetRange],
90+
[GetRangeAsync][obspec.GetRangeAsync], [GetRanges][obspec.GetRanges],
91+
and [GetRangesAsync][obspec.GetRangesAsync].
9892
max_size
9993
Maximum cache size in bytes. Default: 256 MB.
10094
"""

0 commit comments

Comments
 (0)