Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 18 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,45 +4,42 @@

## What is this?

`mxalign` is an `xarray`-based package designed for the alignment and verification of meteorological datasets. It standardizes operations across datasets by attaching properties along three main axes:
- **Space:** Grid or point-based data
- **Time:** Forecasts, observations, or climatology
- **Uncertainty:** Deterministic, ensemble, or quantile forecasts
`mxalign` is an `xarray`-based package for aligning meteorological datasets. It operates on datasets that carry **traits** — metadata attributes that describe the nature of a dataset along three axes:

Currently, `mxalign` also acts as a full execution engine. It can load datasets (e.g., Anemoi inference outputs, observation datasets), apply transformations, align datasets in both space and time to match a reference, safely broadcast NaNs, and execute verification metrics on scaled Dask clusters (Local or Slurm).
`mxalign` is an `xarray`-based package for aligning meteorological datasets. It operates on datasets that carry **traits** — metadata attributes that describe the nature of a dataset along three axes:
- **Space:** `grid` or `point`
- **Time:** `forecast`, `observation`, or `climatology`
- **Uncertainty:** `deterministic`, `ensemble`, or `quantile`

> ⚠️ **Roadmap & Future Architecture Changes (planned for v0.2.0):**
> Currently, `mxalign` handles both alignment and the execution of the verification tooling pipeline, including loading and validation. In the upcoming `v0.2.0` release, this architecture will be refactored:
> - **Loading** will be split out into [`mlwp-data-loaders`](https://github.com/mlwp-tools/mlwp-data-loaders).
> - **Validation** of loaded `xr.Dataset`s will be moved to [`mlwp-data-specs`](https://github.com/mlwp-tools/mlwp-data-specs) (which will contain the requirements for each of the dataset traits and the validation logic).
> - **Execution** of the full verification pipeline (loading, transformations, alignment, and verification) from configuration files may be moved to a separate package in future releases.
> - **Tests** will be added to `mxalign` (building on test datasets already integrated into `mlwp-data-loaders`) that ensure that all alignment operations work correctly (Testing notebook execution inside `mxalign` is explicitly excluded from the current roadmap).
These traits are defined and validated by [`mlwp-data-specs`](https://github.com/mlwp-tools/mlwp-data-specs) and attached to datasets by [`mlwp-data-loaders`](https://github.com/mlwp-tools/mlwp-data-loaders). `mxalign` reads them to infer how datasets should be aligned, without needing to know how they were loaded.

`mxalign` currently supports alignment in **space** and **time**. Alignment along the **uncertainty** axis (e.g. ensemble to deterministic) is planned for a future release.

## Python API

`mxalign` provides building blocks for manual alignment, transformations, and interpolations of `xarray` datasets. This is ideal for interactive use in Jupyter notebooks or custom Python scripts.
`mxalign` provides building blocks for spatial and temporal alignment of `xarray` datasets. This is ideal for interactive use in Jupyter notebooks or custom Python scripts.

```python
import xarray as xr
from mxalign import load, align_space, align_time, transform
import mlwp_data_loaders as dl
import mxalign as mx

# Load datasets (using registered loaders)
ds_obs = load(name="observations_loader", files=["obs.nc"])
ds_fcst = load(name="anemoi_inference", files=["forecast.nc"])
# Load datasets — traits are attached by the loader
ds_obs = dl.load("observations_loader", files=["obs.nc"])
ds_fcst = dl.load("anemoi_inference", files=["forecast.nc"])

# Align the forecast spatially to match the observation reference
ds_fcst_aligned_space = align_space(ds_fcst, reference=ds_obs, method="interpolation")
ds_fcst_aligned = mx.align_space(ds_fcst, reference=ds_obs, method="interpolation")

# Align datasets temporally
datasets = {"obs": ds_obs, "fcst": ds_fcst_aligned_space}
aligned_datasets = align_time(datasets, method="intersection")
datasets = {"obs": ds_obs, "fcst": ds_fcst_aligned}
aligned_datasets = mx.align_time(datasets, method="intersection")
```

For a more comprehensive interactive example, check out the [introductory notebook](./examples/introduction.ipynb).

## Executing via a Configuration

For full verification pipeline execution, `mxalign` uses a YAML configuration file. This allows you to declaratively define how datasets are loaded, transformed, aligned, and verified.
`mxalign` can drive a full verification pipeline from a YAML configuration file, orchestrating dataset loading (via `mlwp-data-loaders`), transformations, alignment, and verification.

### Configuration Contents

Expand Down
6 changes: 6 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ dependencies = [
"zarr<3.0",
"bokeh>=3.8.2",
"distributed>=2026.1.2",
"mlwp-data-specs",
"mlwp-data-loaders",
]

[project.scripts]
Expand Down Expand Up @@ -50,3 +52,7 @@ dev = [
"ipykernel>=7.2.0",
"pytest>=8.0.0",
]

[tool.uv.sources]
mlwp-data-specs = { git = "https://github.com/mlwp-tools/mlwp-data-specs", rev = "059f382" }
mlwp-data-loaders = { git = "https://github.com/mlwp-tools/mlwp-data-loaders" }
11 changes: 0 additions & 11 deletions src/mxalign/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
from .properties.properties import Properties, Time, Space, Uncertainty
from .loaders.loader import load
from .loaders.registry import available_loaders, register_loader
from .transformations.transform import transform
from .transformations.registry import available_transformations, register_transformation
from .interpolations.interpolate import interpolate
Expand All @@ -9,18 +6,10 @@
from .align.space import align_space

from . import accessors
from . import loaders
from . import transformations
from . import interpolations

__all__ = [
"Properties",
"Time",
"Space",
"Uncertainty",
"load",
"available_loaders",
"register_loader",
"transform",
"available_transformations",
"register_transformation",
Expand Down
7 changes: 3 additions & 4 deletions src/mxalign/accessors/space.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,8 @@
import cartopy.crs as ccrs
import numpy as np

from ..properties.properties import Space
from ..properties.utils import properties_from_attrs

from mlwp_data_specs.api import SPACE_TRAIT_ATTR
from mlwp_data_specs.specs.traits.spatial_coordinate import Space
from ..utils.projections import create_cartopy_crs, BUILTIN

# Tolerance in degrees that the coordinates of two grids can differ while still being interpreted as the same grid.
Expand All @@ -15,7 +14,7 @@
@xr.register_dataset_accessor("space")
class SpaceAccessor:
def __init__(self, ds):
self._space = properties_from_attrs(ds).space
self._space = ds.attrs[SPACE_TRAIT_ATTR]
self._ds = ds

def is_grid(self):
Expand Down
11 changes: 6 additions & 5 deletions src/mxalign/accessors/time.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
import xarray as xr
import numpy as np

from ..properties.properties import Time
from ..properties.utils import properties_from_attrs, update_time_property
from mlwp_data_specs.api import TIME_TRAIT_ATTR
from mlwp_data_specs.specs.traits.time_coordinate import Time
from ..utils.traits import update_time_trait


@xr.register_dataset_accessor("time")
class TimeAccessor:
def __init__(self, ds):
self._time = properties_from_attrs(ds).time
self._time = ds.attrs[TIME_TRAIT_ATTR]
self._ds = ds

def is_forecast(self):
Expand Down Expand Up @@ -122,7 +123,7 @@ def _align_forecast_observation(
exclude=set(ds_forecast_stacked.coords)
| set(ds_observation.coords) - set(["valid_time"]),
)
ds_forecast_aligned = update_time_property(ds_forecast_aligned, Time.OBSERVATION)
ds_forecast_aligned = update_time_trait(ds_forecast_aligned, Time.OBSERVATION)
return ds_forecast_aligned, ds_observation_aligned


Expand Down Expand Up @@ -165,7 +166,7 @@ def _align_observation_forecast(ds_observation, ds_forecast, only_common=False):
ds_observation_aligned = ds_observation_aligned.transpose(
"reference_time", "lead_time", ...
)
ds_observation_aligned = update_time_property(ds_observation_aligned, Time.FORECAST)
ds_observation_aligned = update_time_trait(ds_observation_aligned, Time.FORECAST)
if only_common:
return ds_observation_aligned, ds_forecast_cut
else:
Expand Down
6 changes: 3 additions & 3 deletions src/mxalign/interpolations/base.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import xarray as xr
from ..properties.properties import Space
from ..properties.utils import update_space_property
from mlwp_data_specs.specs.traits.spatial_coordinate import Space
from ..utils.traits import update_space_trait


class BaseInterpolator:
Expand All @@ -21,7 +21,7 @@ def interpolate(
self, source_dataset: xr.Dataset | xr.DataArray
) -> xr.Dataset | xr.DataArray:
ds_out = self._interpolate(source_dataset)
return update_space_property(ds_out, self.target_space)
return update_space_trait(ds_out, self.target_space)

def _interpolate(
self, source_dataset: xr.Dataset | xr.DataArray
Expand Down
18 changes: 15 additions & 3 deletions src/mxalign/interpolations/delaunay.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,13 @@

from .base import BaseInterpolator
from .registry import register_interpolator
from ..properties.properties import Space
from ..properties.utils import properties_from_attrs, set_properties_attrs

from mlwp_data_specs.specs.traits.spatial_coordinate import Space
from mlwp_data_specs.api import (
TIME_TRAIT_ATTR,
SPACE_TRAIT_ATTR,
UNCERTAINTY_TRAIT_ATTR,
)


@register_interpolator
Expand Down Expand Up @@ -82,7 +87,14 @@ def _interpolate(self, source_dataset):
latitude=self.target_dataset["latitude"],
longitude=self.target_dataset["longitude"],
)
return set_properties_attrs(ds_out, properties_from_attrs(source_dataset))

ds_out.attrs.update(
{
k: source_dataset.attrs[k]
for k in [TIME_TRAIT_ATTR, SPACE_TRAIT_ATTR, UNCERTAINTY_TRAIT_ATTR]
}
)
return ds_out


def _build_weight_matrix(
Expand Down
2 changes: 1 addition & 1 deletion src/mxalign/interpolations/xarray.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from .base import BaseInterpolator
from .registry import register_interpolator
from ..properties.properties import Space
from mlwp_data_specs.specs.traits.spatial_coordinate import Space

import xarray as xr

Expand Down
92 changes: 0 additions & 92 deletions src/mxalign/loaders/anemoi_datasets.py

This file was deleted.

Loading
Loading