Fix two pandas 3.0 incompatibilities (StringDtype groupby, Series positional indexing) by benmsanderson · Pull Request #321 · openscm/scmdata

benmsanderson · 2026-05-23T18:12:05Z

Summary

pandas 3.0 introduced two changes that scmdata 0.18 trips on for any
multi-scenario ScmRun:

Default StringDtype inference. String columns now come back as
pd.StringDtype rather than object. RunGroupBy.__init__ calls
numpy.issubdtype(col.dtype, numpy.number) to detect numeric meta
columns; on StringDtype this raises:
```
TypeError: Cannot interpret '<StringDtype(storage='python', na_value=nan)>' as a data type
```
Route the check through pd.api.types.is_numeric_dtype instead,
which returns False for StringDtype and True for numeric
dtypes.
Removal of Series positional integer indexing.
_xarray._many_to_one ended with
checker.groupby(col2).count().max()[0]. .max() on a DataFrame
returns a label-indexed Series, and pandas 3.0 has removed
positional integer indexing on those — [0] now raises
KeyError: 0. Use .iloc[0]: same semantics, explicit positional.

Both calls are exercised by every multi-scenario ScmRun. The second
in particular blocks ScmRun.to_nc entirely on pandas 3.0, so any
downstream that streams scenarios to disk currently cannot run.

Backwards compatibility

Both replacements have been pandas's canonical APIs since well before
pandas 2.0:

pandas.api.types.is_numeric_dtype — present since pandas 0.18
Series.iloc[0] — long-standing positional accessor

So the change is safe on pandas 2.x as well; no version pin needed.

Test plan

Existing tests/unit/test_groupby.py and tests/unit/test_netcdf.py
both exercise the affected code paths and were failing on pandas 3.0
before this change. No new tests added — the existing suite is the
regression coverage.

Context

Found while deploying
openscm/openscm-runner for
AR7-cycle work .

…itional indexing) pandas 3.0 introduced two changes that scmdata 0.18 trips on for any multi-scenario ScmRun: 1. Default StringDtype inference. String columns now come back as pd.StringDtype rather than object. RunGroupBy.__init__ called numpy.issubdtype(col.dtype, numpy.number) to detect numeric meta columns; on StringDtype this raises 'TypeError: Cannot interpret <StringDtype(...)> as a data type'. Route the check through pd.api.types.is_numeric_dtype instead, which returns False for StringDtype and True for numeric dtypes. 2. Removal of Series positional integer indexing. _xarray._many_to_one ended with checker.groupby(col2).count().max()[0]. max() on a DataFrame returns a label-indexed Series and pandas 3.0 removed positional integer indexing on those, so [0] raises 'KeyError: 0'. Use .iloc[0]: same semantics, explicit positional. Both calls are exercised by every multi-scenario ScmRun. The second in particular blocks ScmRun.to_nc entirely on pandas 3.0, so any downstream that streams scenarios to disk (e.g. openscm-runner's NetCDFChunkWriter) currently cannot run. The fixes are backward-compatible: pd.api.types.is_numeric_dtype and Series.iloc[0] have been pandas's canonical APIs since well before pandas 2.0.

Mirror of scripts/run_rcmip_fair2.py for the CICEROSCMPY2 adapter: runs every SSP in the RCMIP fixture (ssp119, ssp126, ssp245, ssp370 and the two lowNTCF variants, ssp434, ssp460, ssp534-over, ssp585) against N posterior members of a CICERO-SCM v2.x distribution and prints a per-scenario 2100 GSAT / CO2 / ERF summary. Defaults to splice mode (user emissions + bundled ssp245 historical), which is the path the demo uses. Pass --cicero-bundle-dir to switch to bundle mode (Marit RCMIP-aligned setup) where gaspam and conc files are resolved per-scenario from inside the bundle directory. Smoke-tested end-to-end against draw_samples_500.json with 20 members: ~44 s for 10 scenarios x 20 members on a single thread. 2100 GSAT medians are systematically warmer than the FaIRv2 numbers on the same protocol (e.g. ssp245 3.77 K vs FaIR 2.63 K, ssp585 6.72 K vs FaIR 4.82 K). The CICEROSCM bundle's ECS distribution is wider than FaIR's, and the 20-member subset is small relative to the full 500-member posterior, so the offset is consistent with the expected inter-model spread. Results are kept in memory: the NetCDFChunkWriter path currently trips a scmdata-pandas-3 incompatibility (fixed in PR #11 / upstream openscm/scmdata#321), so writer support stays out of this script until those land in main.

benmsanderson added 2 commits May 23, 2026 20:11

Add changelog fragment for PR openscm#321

ebeb601

This was referenced May 23, 2026

Remove _scmdata_patches monkey-patch once upstream scmdata is fixed benmsanderson/openscm-runner#10

Open

scmdata pandas-3 compatibility patches (in-tree shim, tracking #10) benmsanderson/openscm-runner#11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix two pandas 3.0 incompatibilities (StringDtype groupby, Series positional indexing)#321

Fix two pandas 3.0 incompatibilities (StringDtype groupby, Series positional indexing)#321
benmsanderson wants to merge 2 commits into
openscm:mainfrom
benmsanderson:fix/pandas3-compat

benmsanderson commented May 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

benmsanderson commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Backwards compatibility

Test plan

Context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

benmsanderson commented May 23, 2026 •

edited

Loading