|
346 | 346 | ], |
347 | 347 | "outputs": [], |
348 | 348 | "execution_count": 13 |
| 349 | + }, |
| 350 | + { |
| 351 | + "metadata": { |
| 352 | + "id": "EQMXRq18_G-Y" |
| 353 | + }, |
| 354 | + "cell_type": "markdown", |
| 355 | + "source": [ |
| 356 | + "## Missing features\n", |
| 357 | + "\n", |
| 358 | + "`xbeam.Dataset` is not yet complete, and we would welcome contributions! Here are a few features that would be particularly welcome. If you're interested in any of these, the easiest way to get in touch is to [raise an issue](https://github.com/google/xarray-beam/issues) on GitHub.\n", |
| 359 | + "\n", |
| 360 | + "### Operations that combine multiple datasets\n", |
| 361 | + "\n", |
| 362 | + "Support for operations that merge together different datasets would be quite welcome, e.g., to evaluate model outputs against ground truth. Currently, tools like [WeatherBenchX](https://github.com/google-research/weatherbenchX/) acheive this by writing custom Beam pipelines.\n", |
| 363 | + "\n", |
| 364 | + "There are two ways these might be implemented for `Dataset`:\n", |
| 365 | + "\n", |
| 366 | + "1. By supporting multiple Dataset arguments in a `xbeam.map_blocks()` function ([tracking issue](https://github.com/google/xarray-beam/issues/173)).\n", |
| 367 | + "2. By supporting xarray.DataTree objects ([tracking issue](https://github.com/google/xarray-beam/issues/124))\n", |
| 368 | + "\n", |
| 369 | + "In the long term, `DataTree` support for simultaneously loading data is a better option, because merging together separate Beam ptransforms (as would be required for `map_blocks`) requires an expensive shuffle step via `beam.CoGroupByKey`. This will require the upstream Xarray project supporting a bit more functionality with DataTree, most notably `concat` and `combine_nested`.\n", |
| 370 | + "\n", |
| 371 | + "### Aggregations other than `mean`\n", |
| 372 | + "\n", |
| 373 | + "Currently, Xarray-Beam only supports an efficient aggregation implementation for {py:meth}`~xarray_beam.Dataset.mean`, but it should be relatively straightforward to extend this for many other common Xarray aggregation, e.g., `sum`, `min`, `max`, `all`, `any`, `std`, `var`, etc.\n", |
| 374 | + "\n", |
| 375 | + "### IO connectors\n", |
| 376 | + "\n", |
| 377 | + "Tools for reading/writing Xarray-Beam into other distributed storage systems, such as Google Earth Engine (see [XEE](https://github.com/google/Xee)) and [Icechunk](https://icechunk.io/), would be very welcome.\n", |
| 378 | + "\n", |
| 379 | + "### Other `Dataset` operations\n", |
| 380 | + "\n", |
| 381 | + "`xbeam.Dataset` has an intentionally small API surface, so features that can implemented via a trivial call to `map_blocks()` are probably not a good fit for Xarray-Beam itself.\n", |
| 382 | + "\n", |
| 383 | + "That said, there are plenty of other Xarray methods that _do_ require updates to underlying `chunks` and `xbeam.Key` objects beyond what `map_blocks` can handle (e.g., `rename`, `thin`, `assign_coords`), or for which more efficient distributed algorithms exist (e.g., for [groupby and resampling](https://github.com/xarray-contrib/flox)).\n", |
| 384 | + "\n", |
| 385 | + "We are also contemplating starting another open source project for collecting generally useful utilities that a little too weather/climate domain-specific to make sense in Xarray-Beam, e.g., for regridding." |
| 386 | + ] |
| 387 | + }, |
| 388 | + { |
| 389 | + "metadata": { |
| 390 | + "id": "8pdnX_kJ_gYV" |
| 391 | + }, |
| 392 | + "cell_type": "markdown", |
| 393 | + "source": [] |
349 | 394 | } |
350 | 395 | ], |
351 | 396 | "metadata": { |
|
0 commit comments