Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
22d2240
Switch from old stable version to pending PR for mainline adoption
BrianMichell Dec 5, 2025
eb31d96
Apply minimal updates to get working
BrianMichell Dec 5, 2025
e500bfd
Updates to support both drivers
BrianMichell Dec 8, 2025
473e475
Fix reading stats from mdio-python files
BrianMichell Dec 12, 2025
204421a
Fix fill value metadata issue for boolean types
BrianMichell Dec 12, 2025
a5e5a24
Fix chunk shape in test
BrianMichell Dec 12, 2025
f056887
Add zarr3 driver to internal dependencies
BrianMichell Dec 12, 2025
be55802
Unity v2/3 acceptance test, add support for zarr version specificatio…
BrianMichell Dec 18, 2025
b567425
Ensure all appropriate tests run for v2 and v3 drivers
BrianMichell Dec 18, 2025
db8ffb9
Suppress warning output spam
BrianMichell Dec 19, 2025
faa8393
Fix driver requirements
BrianMichell Dec 19, 2025
b259f60
Update inert bucket example to be more appropriate
BrianMichell Dec 19, 2025
2a09c99
Propogate context through internal objects to allow for credentials t…
BrianMichell Dec 22, 2025
fbf46af
Add optional code coverage and begin expanding coverage
BrianMichell Dec 30, 2025
bd0906f
Expand meaningful test coverage
BrianMichell Jan 2, 2026
649610e
Merge pull request #1 from BrianMichell/expand_coverage
BrianMichell Jan 2, 2026
528cab7
Formatting and linting
BrianMichell Jan 2, 2026
9d1f717
Begin re-unifying the zarr drivers
BrianMichell Jan 2, 2026
d5cdcf8
Fix copyright date
BrianMichell Jan 2, 2026
ff8fa55
Resolve logic error for serializing structarray with v3 driver
BrianMichell Jan 2, 2026
74b2cab
Add support for field selection of v3 datasets
BrianMichell Jan 2, 2026
1609ff8
Fix logic for fill values on v3 driver
BrianMichell Jan 2, 2026
1b7f8f5
Reduce boilerplate
BrianMichell Jan 2, 2026
b78d7bc
Begin fixing metadata serialization errors
BrianMichell Jan 2, 2026
19b2855
Resolve zarr3 outputting improper metadata, fix filesystem access to …
BrianMichell Jan 5, 2026
b1c673f
Refactor metadata handling and path utilities for Zarr drivers
BrianMichell Jan 5, 2026
dc65f2a
Linting
BrianMichell Jan 5, 2026
eba8c41
Merge pull request #2 from BrianMichell/fix_regressions
BrianMichell Jan 5, 2026
2bdeb94
Update tensorstore latest (#3)
BrianMichell Jun 11, 2026
9c32e02
Cleanup debugging changes
BrianMichell Jun 11, 2026
575ee58
Add mdio-python compatibility test
BrianMichell Jun 15, 2026
6f24f73
Lay groundwork for `mdio-python` compatibility
BrianMichell Jun 15, 2026
5d51eb5
Add support for special case segy file header Variable
BrianMichell Jun 16, 2026
3410696
Add tests to run
BrianMichell Jun 16, 2026
700c4de
Update Python, uv, and pyproject toml
BrianMichell Jun 16, 2026
a09a3ec
Update deps to use dev mdio-python 1.2.0 release
BrianMichell Jun 16, 2026
1b35317
Modernize Python deps for devcontainer
BrianMichell Jun 16, 2026
95b5403
Fix venv
BrianMichell Jun 16, 2026
c804701
Linting and formatting
BrianMichell Jun 17, 2026
868f4de
Ensure fill value parity with `mdio-python` API
BrianMichell Jun 17, 2026
2704bf8
Fix missed fill value change
BrianMichell Jun 24, 2026
ba012c0
Update schema
BrianMichell Jun 24, 2026
ff74649
Linting
BrianMichell Jun 24, 2026
fbb71e0
Linting and formatting fix
BrianMichell Jun 24, 2026
5d86ab6
Use v3 driver as default
BrianMichell Jun 25, 2026
f0ecccb
Minor updates to examples
BrianMichell Jun 25, 2026
ed2832f
Remove codecov reporting for now
BrianMichell Jun 25, 2026
478e0fd
Cleanup pyproject toml file
BrianMichell Jun 25, 2026
1e9ab7d
Add missing copyright headers
BrianMichell Jun 25, 2026
52459b8
Fix Python module missing skip logic
BrianMichell Jun 25, 2026
12bd60b
Fix v3 metadata write changes
BrianMichell Jun 25, 2026
cb784e1
Remove utils v2 only shims
BrianMichell Jun 25, 2026
5674b7f
Formatting
BrianMichell Jun 25, 2026
48419e9
Fix improperly removed stats object
BrianMichell Jun 25, 2026
9e1b69d
Use helper function instead of many if/else
BrianMichell Jun 25, 2026
dd6a1da
Simplify dtype checker
BrianMichell Jun 25, 2026
a41e347
Reuse helpers
BrianMichell Jun 25, 2026
1aa9971
Fix missing compressor getter test
BrianMichell Jun 25, 2026
bc51266
Formatting
BrianMichell Jun 25, 2026
4270199
Delete legacy internal dtype fomatter
BrianMichell Jun 25, 2026
009b6da
Cleanup common compressor settings
BrianMichell Jun 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 10 additions & 15 deletions .devcontainer/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
FROM mcr.microsoft.com/devcontainers/cpp:dev-ubuntu24.04

COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/

ENV LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

Expand All @@ -12,7 +13,6 @@ ENV PIP_DISABLE_PIP_VERSION_CHECK=1 \

ENV DEBIAN_FRONTEND=noninteractive

ARG POETRY_VERSION=1.2.2
ARG CMAKE_MAJOR=3.24
ARG CMAKE_VERSION=3.24.2
ARG HYPERFINE=1.15.0
Expand Down Expand Up @@ -45,27 +45,22 @@ RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | \
apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
RUN apt-get update && apt-get install -y google-cloud-cli

RUN apt update -y \
&& apt install software-properties-common -y \
&& add-apt-repository ppa:deadsnakes/ppa

# Install apt dependencies
# Install apt dependencies (Python 3.12 is native on Ubuntu 24.04)
RUN apt update -y \
&& apt install -y --no-install-recommends \
python3.10-full \
python3-full \
python3-pip \
&& rm -rf /var/lib/apt/lists/* \
&& apt clean

RUN mkdir -p /venv
RUN python3.10 -m venv /venv
# Create virtual environment and sync dependencies
RUN uv venv /venv
ENV UV_PROJECT_ENVIRONMENT=/venv
ENV PATH="/venv/bin:$PATH"

RUN pip install wheel yapf cpplint==${CPPLINT_VERSION} zarr xarray

RUN pip install \
"poetry==$POETRY_VERSION" \
&& poetry config virtualenvs.create false
# if poetry doesn't pick up a deps this for some reason pip install here
RUN mkdir -p /tmp/devcontainer
COPY .devcontainer/pyproject.toml .devcontainer/uv.lock /tmp/devcontainer/
RUN uv sync --project /tmp/devcontainer --frozen --all-groups

# if we want the user to be able to do pip install etc.
RUN chmod -R 777 /venv
Expand Down
54 changes: 54 additions & 0 deletions .devcontainer/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Copyright 2026 TGS
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

[project]
name = "mdio-cpp-dev"
version = "0.1.0"
description = "MDIO C++ library development environment"
requires-python = ">=3.12,<3.13"

# multidimio[cloud] pulls in zarr, xarray, numpy, and cloud storage backends
dependencies = [
"multidimio[cloud]==1.1.4.dev1781540584",
]

[dependency-groups]
dev = [
"pytest>=7.0",
"ruff>=0.8",
"mypy>=1.5",
"cpplint==1.6.1",
]

[tool.uv]
package = false
prerelease = "allow"

[[tool.uv.index]]
name = "testpypi"
url = "https://test.pypi.org/simple/"
explicit = true

[tool.uv.sources]
multidimio = { index = "testpypi" }

[tool.ruff]
line-length = 79

[tool.ruff.lint]
select = ["E", "F", "I", "UP"]

[tool.mypy]
python_version = "3.12"
plugins = []
1,623 changes: 1,623 additions & 0 deletions .devcontainer/uv.lock

Large diffs are not rendered by default.

19 changes: 11 additions & 8 deletions .github/workflows/cmake_build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,18 +30,17 @@ jobs:
cxx-compiler: ${{ matrix.compiler }}
- name: Build tests
run: cd build && pwd && make -j
- name: Install Python 3.10
- name: Install UV
run: |
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update -y
sudo apt install -y python3.10 python3.10-venv python3.10-dev
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 1
curl -LsSf https://astral.sh/uv/install.sh | sh
echo "$HOME/.local/bin" >> $GITHUB_PATH
- name: Install test dependencies
run: |
python3.10 -m pip install --upgrade pip setuptools wheel --no-input
python3.10 -m pip install yapf cpplint zarr xarray --no-input
uv sync --project .devcontainer --frozen
# No system python environment bypass! uv will automatically handle creating a local .venv in .devcontainer folder.
- name: Run tests
run: |
source .devcontainer/.venv/bin/activate
cd build/mdio/ \
&& ./mdio_acceptance_test \
&& ./mdio_variable_test \
Expand All @@ -52,4 +51,8 @@ jobs:
&& ./mdio_utils_trim_test \
&& ./mdio_utils_delete_test \
&& ./mdio_variable_collection_test \
&& ./mdio_coordinate_selector_test
&& ./mdio_coordinate_selector_test \
&& ./mdio_header_variable_test \
&& ./mdio_zarr_test \
&& ./mdio_gcs_test \
&& ./mdio_s3_test
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ if(NOT CMAKE_SOURCE_DIR STREQUAL CMAKE_CURRENT_SOURCE_DIR)
set(mdio_INTERNAL_DEPS
tensorstore::driver_array
tensorstore::driver_zarr
tensorstore::driver_zarr3
tensorstore::driver_json
tensorstore::kvstore_file
tensorstore::stack
Expand Down
10 changes: 5 additions & 5 deletions cmake/FindEXT_TENSORSTORE.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@ IF ( NOT TARGET tensorstore )

include(FetchContent)

FetchContent_Declare(
tensorstore
GIT_REPOSITORY
https://github.com/brian-michell/tensorstore.git
GIT_TAG v0.1.63_latest
FetchContent_Declare(
tensorstore
GIT_REPOSITORY
https://github.com/google/tensorstore.git
GIT_TAG 917edaf341217f750b7bd3b8db6e75e6db64eab8
)

FetchContent_MakeAvailable(tensorstore)
Expand Down
2 changes: 1 addition & 1 deletion cmake/Findnlohmann_json_schema_validator.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ if (NOT TARGET nlohmann_json_schema_validator)
FetchContent_Declare(
nlohmann_json_schema_validator
GIT_REPOSITORY https://github.com/pboettch/json-schema-validator.git
GIT_TAG 2.2.0
GIT_TAG 2.4.0
)

if(NOT BUILD_VALIDATOR)
Expand Down
26 changes: 15 additions & 11 deletions examples/dataset_example/src/dataset_example.cc
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,10 @@ absl::Status Run() {
/// mdio::constants::kCreateClean -
/// Create a new dataset and overwrite an existing one.
///
/// New datasets default to the Zarr V3 format. To target a specific format,
/// pass an optional mdio::zarr::ZarrVersion argument before the open mode,
/// e.g. mdio::Dataset::from_json(json_spec, dataset_path,
/// mdio::zarr::ZarrVersion::kV2, mdio::constants::kCreateClean);
auto dataset_future = mdio::Dataset::from_json(json_spec, dataset_path,
mdio::constants::kCreateClean);

Expand All @@ -79,7 +83,7 @@ absl::Status Run() {
/// handled by returning a "Result" object; the result can be tested for
/// "ok()", there is also a convenient Tensorstore macro for doing assigment
/// while handling errors.
MDIO_ASSIGN_OR_RETURN(auto dataset, dataset_result)
MDIO_ASSIGN_OR_RETURN(auto dataset, dataset_result);

/// The dataset represent data on disk, the object holds only minimal state.
/// Its memory footprint is small.
Expand Down Expand Up @@ -113,7 +117,7 @@ absl::Status Run() {
mdio::SliceDescriptor desc1 = {"inline", 20, 120, 1};
mdio::SliceDescriptor desc2 = {"crossline", 100, 200, 1};

MDIO_ASSIGN_OR_RETURN(auto slice, dataset.isel(desc1, desc2))
MDIO_ASSIGN_OR_RETURN(auto slice, dataset.isel(desc1, desc2));
/// The slice represents a subset of the MDIO data on disk, I/O operations can
/// be made using the variables contained in this slice.
std::cout << slice << "\n\n" << std::endl;
Expand All @@ -123,7 +127,7 @@ absl::Status Run() {
/// describing data, this means that the datatype is discovered at runtime.
/// Here we request a Variable given we know its dtype of unit32:
MDIO_ASSIGN_OR_RETURN(auto variableObject,
slice.variables.get<uint32_t>("inline"))
slice.variables.get<uint32_t>("inline"));
/// The Variable object contains a collection of metadata associated with the
/// seismic, including names, and units.
std::cout << variableObject << std::endl;
Expand All @@ -143,7 +147,7 @@ absl::Status Run() {
// MDIO_ASSIGN_OR_RETURN(
// auto data, tensorstore::Read(variableObject.get_store()).result()
// )
MDIO_ASSIGN_OR_RETURN(auto data, variableObject.Read().result())
MDIO_ASSIGN_OR_RETURN(auto data, variableObject.Read().result());
auto d1 = data.get_data_accessor();
/// The read returns a SharedArray, which we can addrss like and array, in 3-d
/// we might do something like this data({0, 0, 0}), here the inline label is
Expand All @@ -163,7 +167,7 @@ absl::Status Run() {
/// Optionally, the variable can handle the I/O. In this scenario data can be
/// read into a VariableData object that retains it's dimension names and
/// other metadata.
MDIO_ASSIGN_OR_RETURN(auto variableData, variableObject.Read().result())
MDIO_ASSIGN_OR_RETURN(auto variableData, variableObject.Read().result());

/// The accessor method provides access to an underlying tensorstore
/// SharedArray.
Expand All @@ -185,7 +189,7 @@ absl::Status Run() {
auto existing_dataset =
mdio::Dataset::Open(dataset_path, mdio::constants::kOpen).result();

MDIO_ASSIGN_OR_RETURN(dataset, existing_dataset)
MDIO_ASSIGN_OR_RETURN(dataset, existing_dataset);

/// In the previous scenario used a slice to operate over the range of data
/// [20, 120), this dataset is defined over the entire range, so we should see
Expand All @@ -196,14 +200,14 @@ absl::Status Run() {
/// to uint32_t (in this case), then the get method will return a result that
/// is not OK.
MDIO_ASSIGN_OR_RETURN(variableObject,
dataset.variables.get<uint32_t>("inline"))
dataset.variables.get<uint32_t>("inline"));
inclusive_min =
variableObject.get_store().domain()[0].interval().inclusive_min();
exclusive_max =
variableObject.get_store().domain()[0].interval().exclusive_max();

/// Here we read all of the variable from disk into memory
MDIO_ASSIGN_OR_RETURN(variableData, variableObject.Read().result())
MDIO_ASSIGN_OR_RETURN(variableData, variableObject.Read().result());
tick_labels = variableData.get_data_accessor();
for (Index i = inclusive_min; i < exclusive_max; i += 20) {
std::cout << "dataset domain, " << variableData.variableName
Expand All @@ -219,10 +223,10 @@ absl::Status Run() {
}
MDIO_ASSIGN_OR_RETURN(
/// in this example, all the dims labels are unint32
variableObject, dataset.variables.get<uint32_t>(label))
variableObject, dataset.variables.get<uint32_t>(label));

/// Suppose we want to read existing values ...
MDIO_ASSIGN_OR_RETURN(variableData, variableObject.Read().result())
MDIO_ASSIGN_OR_RETURN(variableData, variableObject.Read().result());

inclusive_min =
variableObject.get_store().domain()[0].interval().inclusive_min();
Expand All @@ -242,7 +246,7 @@ absl::Status Run() {
/// For the purposes of information hiding, you can also extract a single
/// variable and its coordinates, with the other metadata. This acts like a
/// regular Dataset, but has only a single variable.
MDIO_ASSIGN_OR_RETURN(auto inline_labels, dataset["cdp-x"])
MDIO_ASSIGN_OR_RETURN(auto inline_labels, dataset["cdp-x"]);

return absl::OkStatus();
}
Expand Down
4 changes: 2 additions & 2 deletions examples/real_data_example/src/real_data_example.cc
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ absl::Status Run(const Descriptors... descriptors) {
auto dataset,
mdio::Dataset::Open(std::string(absl::GetFlag(FLAGS_dataset_path)),
mdio::constants::kOpen)
.result())
.result());

if (absl::GetFlag(FLAGS_print_dataset)) {
std::cout << dataset << std::endl;
Expand All @@ -70,7 +70,7 @@ absl::Status Run(const Descriptors... descriptors) {
return absl::InvalidArgumentError("Seismic data must be 3D");
}

MDIO_ASSIGN_OR_RETURN(auto seismic_data, ReadWithProgress(variable).result())
MDIO_ASSIGN_OR_RETURN(auto seismic_data, ReadWithProgress(variable).result());

auto seismic_accessor = seismic_data.get_data_accessor();

Expand Down
6 changes: 3 additions & 3 deletions examples/xarray_integration/src/mdio_from_xarray.cc
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,12 @@
absl::Status Run(std::string dataset_path) {
MDIO_ASSIGN_OR_RETURN(
auto dataset,
mdio::Dataset::Open(dataset_path, mdio::constants::kOpen).result())
mdio::Dataset::Open(dataset_path, mdio::constants::kOpen).result());

MDIO_ASSIGN_OR_RETURN(auto variable,
dataset.variables.get<float32_t>("image"))
dataset.variables.get<float32_t>("image"));

MDIO_ASSIGN_OR_RETURN(auto variable_data, variable.Read().result())
MDIO_ASSIGN_OR_RETURN(auto variable_data, variable.Read().result());

auto image = variable_data.get_data_accessor();

Expand Down
7 changes: 6 additions & 1 deletion examples/xarray_integration/src/xarray_integration.cc
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,15 @@ absl::Status Run(std::string dataset_path) {

auto json_spec = XarrayExample();

// This example round-trips through Python's xarray, which reads the store via
// zarr-python 2.x (see pyproject.toml) using consolidated metadata. That is a
// Zarr V2 concept, so we pin this dataset to V2 explicitly. Bumping to V3
// would require zarr-python 3.x and a v3-capable xarray.
MDIO_ASSIGN_OR_RETURN(auto dataset,
mdio::Dataset::from_json(json_spec, dataset_path,
mdio::zarr::ZarrVersion::kV2,
mdio::constants::kCreateClean)
.result())
.result());

auto populate_inline = [](SharedArray<uint32_t>& data) {
for (auto i = data.domain()[0].inclusive_min();
Expand Down
2 changes: 1 addition & 1 deletion examples/xarray_integration/src/xarray_integration.h
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ using SharedArray = mdio::SharedArray<T, R, OriginKind>;
template <typename T = void>
mdio::Result<mdio::VariableData<T>> from_dataset(
const mdio::Dataset& dataset, const std::string& variable_name) {
MDIO_ASSIGN_OR_RETURN(auto variable, dataset.variables.get<T>(variable_name))
MDIO_ASSIGN_OR_RETURN(auto variable, dataset.variables.get<T>(variable_name));

return mdio::from_variable<T>(variable);
}
Expand Down
Loading
Loading