Fix two pandas 3.0 incompatibilities (StringDtype groupby, Series positional indexing)#321
Open
benmsanderson wants to merge 2 commits into
Open
Fix two pandas 3.0 incompatibilities (StringDtype groupby, Series positional indexing)#321benmsanderson wants to merge 2 commits into
benmsanderson wants to merge 2 commits into
Conversation
…itional indexing) pandas 3.0 introduced two changes that scmdata 0.18 trips on for any multi-scenario ScmRun: 1. Default StringDtype inference. String columns now come back as pd.StringDtype rather than object. RunGroupBy.__init__ called numpy.issubdtype(col.dtype, numpy.number) to detect numeric meta columns; on StringDtype this raises 'TypeError: Cannot interpret <StringDtype(...)> as a data type'. Route the check through pd.api.types.is_numeric_dtype instead, which returns False for StringDtype and True for numeric dtypes. 2. Removal of Series positional integer indexing. _xarray._many_to_one ended with checker.groupby(col2).count().max()[0]. max() on a DataFrame returns a label-indexed Series and pandas 3.0 removed positional integer indexing on those, so [0] raises 'KeyError: 0'. Use .iloc[0]: same semantics, explicit positional. Both calls are exercised by every multi-scenario ScmRun. The second in particular blocks ScmRun.to_nc entirely on pandas 3.0, so any downstream that streams scenarios to disk (e.g. openscm-runner's NetCDFChunkWriter) currently cannot run. The fixes are backward-compatible: pd.api.types.is_numeric_dtype and Series.iloc[0] have been pandas's canonical APIs since well before pandas 2.0.
benmsanderson
added a commit
to benmsanderson/openscm-runner
that referenced
this pull request
May 23, 2026
Mirror of scripts/run_rcmip_fair2.py for the CICEROSCMPY2 adapter: runs every SSP in the RCMIP fixture (ssp119, ssp126, ssp245, ssp370 and the two lowNTCF variants, ssp434, ssp460, ssp534-over, ssp585) against N posterior members of a CICERO-SCM v2.x distribution and prints a per-scenario 2100 GSAT / CO2 / ERF summary. Defaults to splice mode (user emissions + bundled ssp245 historical), which is the path the demo uses. Pass --cicero-bundle-dir to switch to bundle mode (Marit RCMIP-aligned setup) where gaspam and conc files are resolved per-scenario from inside the bundle directory. Smoke-tested end-to-end against draw_samples_500.json with 20 members: ~44 s for 10 scenarios x 20 members on a single thread. 2100 GSAT medians are systematically warmer than the FaIRv2 numbers on the same protocol (e.g. ssp245 3.77 K vs FaIR 2.63 K, ssp585 6.72 K vs FaIR 4.82 K). The CICEROSCM bundle's ECS distribution is wider than FaIR's, and the 20-member subset is small relative to the full 500-member posterior, so the offset is consistent with the expected inter-model spread. Results are kept in memory: the NetCDFChunkWriter path currently trips a scmdata-pandas-3 incompatibility (fixed in PR #11 / upstream openscm/scmdata#321), so writer support stays out of this script until those land in main.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
pandas 3.0 introduced two changes that scmdata 0.18 trips on for any
multi-scenario
ScmRun:Default
StringDtypeinference. String columns now come back aspd.StringDtyperather thanobject.RunGroupBy.__init__callsnumpy.issubdtype(col.dtype, numpy.number)to detect numeric metacolumns; on
StringDtypethis raises:Route the check through
pd.api.types.is_numeric_dtypeinstead,which returns
FalseforStringDtypeandTruefor numericdtypes.
Removal of
Seriespositional integer indexing._xarray._many_to_oneended withchecker.groupby(col2).count().max()[0]..max()on a DataFramereturns a label-indexed
Series, and pandas 3.0 has removedpositional integer indexing on those —
[0]now raisesKeyError: 0. Use.iloc[0]: same semantics, explicit positional.Both calls are exercised by every multi-scenario
ScmRun. The secondin particular blocks
ScmRun.to_ncentirely on pandas 3.0, so anydownstream that streams scenarios to disk currently cannot run.
Backwards compatibility
Both replacements have been pandas's canonical APIs since well before
pandas 2.0:
pandas.api.types.is_numeric_dtype— present since pandas 0.18Series.iloc[0]— long-standing positional accessorSo the change is safe on pandas 2.x as well; no version pin needed.
Test plan
Existing
tests/unit/test_groupby.pyandtests/unit/test_netcdf.pyboth exercise the affected code paths and were failing on pandas 3.0
before this change. No new tests added — the existing suite is the
regression coverage.
Context
Found while deploying
openscm/openscm-runner for
AR7-cycle work .