Skip to content

Uncaught std::bad_alloc / gdcm::Exception in Bitmap decode aborts the Python process (SIGABRT) under memory pressure — SWIG bindings lack a global %exception #34

@NagisaVon

Description

@NagisaVon

Summary

Decoding a compressed-pixel-data DICOM with python-gdcm can abort the entire Python process with SIGABRT when a decode-buffer allocation fails under memory pressure. The abort is not catchable from Python — no try/except fires — so a single oversized/under-memory frame takes down the whole interpreter (in a batch/Apache Beam pipeline, the whole worker and every other item it was processing).

The root cause is that GDCM's decode path throws a C++ exception on allocation failure, and the SWIG Python bindings do not translate it: there is no global %exception, only a single method-scoped %exception ReadFooBar. So the throw crosses the generated extern "C" wrapper uncaught → std::terminate()abort().

This is version-independent: on python-gdcm <= 3.0.22 the escaping exception is std::bad_alloc; on 3.2.6 it is a gdcm::Exception from a throwing assert (gdcmBitmap.cxx:896, gdcm_assert(len <= outbv->GetLength())). Upgrading does not fix it.

For contrast, pylibjpeg-openjpeg decoding the same frame under identical memory strain raises a catchable MemoryError instead of aborting.

Environment

  • python-gdcm==3.0.22 (GDCM 3.0.22), also confirmed on 3.2.6
  • Python 3.11, Linux x86_64 (python:3.11-slim container)

Reproduction

The decode that aborts is pure GDCM (gdcm.ImageReader().GetImage().GetBuffer()). To make a failed allocation deterministic (rather than racing the kernel OOM-killer), an LD_PRELOAD shim fails any single malloc/calloc/realloc whose size falls in a band just below the final image buffer — this is exactly what GDCM's intermediate decode buffer falls into. pydicom/pylibjpeg-openjpeg are used only to mint a JPEG 2000 test file; the crash is entirely inside GDCM.

failmalloc.c:

#define _GNU_SOURCE
#include <dlfcn.h>
#include <stddef.h>
#include <stdlib.h>
static void *(*r_m)(size_t)=NULL,*(*r_c)(size_t,size_t)=NULL,*(*r_r)(void*,size_t)=NULL;
static size_t fmin=0,fmax=0; static int ini=0; static char bb[1<<20]; static size_t bo=0;
static void* ba(size_t n){void*p=bb+bo;bo+=(n+15)&~((size_t)15);return bo>sizeof(bb)?NULL:p;}
static void init(void){ini=1;r_m=dlsym(RTLD_NEXT,"malloc");r_c=dlsym(RTLD_NEXT,"calloc");r_r=dlsym(RTLD_NEXT,"realloc");
  char*a=getenv("FAIL_MIN"),*b=getenv("FAIL_MAX");fmin=a?(size_t)strtoull(a,0,10):0;fmax=b?(size_t)strtoull(b,0,10):0;ini=0;}
static int hit(size_t n){return fmin&&n>=fmin&&(!fmax||n<=fmax);}
void*malloc(size_t n){if(!r_m){if(ini)return ba(n);init();}return hit(n)?NULL:r_m(n);}
void*calloc(size_t a,size_t b){if(!r_c){if(ini)return ba(a*b);init();}return (a&&b&&hit(a*b))?NULL:r_c(a,b);}
void*realloc(void*p,size_t n){if(!r_r){if(ini)return ba(n);init();}return hit(n)?NULL:r_r(p,n);}

gen.py (mint a 2600×2600 RGB-16 JPEG 2000 DICOM — decoded size ≈ 38.7 MiB):

import numpy as np, pydicom
from openjpeg.utils import encode_array
from pydicom.uid import JPEG2000, SecondaryCaptureImageStorage, generate_uid
from pydicom.encaps import encapsulate
R = C = 2600
a = np.empty((R, C, 3), np.uint16); a[..., 0] = np.arange(C); a[..., 1] = np.arange(R)[:, None]; a[..., 2] = 4096
cs = encode_array(a, photometric_interpretation=1)
ds = pydicom.Dataset(); ds.file_meta = pydicom.dataset.FileMetaDataset()
ds.file_meta.TransferSyntaxUID = JPEG2000
ds.file_meta.MediaStorageSOPClassUID = SecondaryCaptureImageStorage
ds.file_meta.MediaStorageSOPInstanceUID = generate_uid()
ds.SOPClassUID = SecondaryCaptureImageStorage; ds.SOPInstanceUID = ds.file_meta.MediaStorageSOPInstanceUID
ds.Rows = R; ds.Columns = C; ds.SamplesPerPixel = 3; ds.PhotometricInterpretation = "RGB"
ds.PlanarConfiguration = 0; ds.BitsAllocated = 16; ds.BitsStored = 16; ds.HighBit = 15; ds.PixelRepresentation = 0
ds.PixelData = encapsulate([cs]); ds.is_little_endian = True; ds.is_implicit_VR = False
ds.save_as("/tmp/synth.dcm", write_like_original=False)

decode_gdcm.py (pure GDCM — this is the failing path):

import sys, gdcm
r = gdcm.ImageReader(); r.SetFileName("/tmp/synth.dcm")
if not r.Read(): sys.exit("read failed")
buf = r.GetImage().GetBuffer()       # triggers the JPEG 2000 decode in GDCM C++
print("decoded ok, bytes:", len(buf))

Run (in python:3.11-slim):

pip install python-gdcm==3.0.22 pydicom==3.0.2 numpy pylibjpeg pylibjpeg-openjpeg
gcc -shared -fPIC -o failmalloc.so failmalloc.c -ldl
python gen.py
ulimit -c 0   # skip the (slow) core dump so the abort is a fast SIGABRT

# baseline — decodes fine:
python decode_gdcm.py
#   -> decoded ok, bytes: 40560000

# fail one allocation in the band just below the 38.7 MiB output buffer:
LD_PRELOAD=./failmalloc.so FAIL_MIN=20971520 FAIL_MAX=38797312 python decode_gdcm.py
#   -> terminate called after throwing an instance of 'std::bad_alloc'
#   ->   what():  std::bad_alloc
#   -> Aborted   (exit 134 — SIGABRT, uncatchable from Python)

(On python-gdcm==3.2.6 the message is instead terminate called after throwing an instance of 'gdcm::Exception' ... gdcmBitmap.cxx:896 ... len <= outbv->GetLength(), but the fatal terminate → abort() is the same.)

Suggested fix

The bindings should never let a C++ exception escape into terminate(). GDCM's Python interface (Wrapping/Python/gdcmswig.i, which this project packages) currently wraps only ReadFooBar. Adding a global %exception translates every wrapped method's throw into a catchable Python exception:

%exception {
  try {
    $action
  } catch (const std::bad_alloc &e) {
    PyErr_SetString(PyExc_MemoryError, e.what());
    SWIG_fail;
  } catch (const std::exception &e) {   // gdcm::Exception derives from std::exception
    PyErr_SetString(PyExc_RuntimeError, e.what());
    SWIG_fail;
  } catch (...) {
    PyErr_SetString(PyExc_RuntimeError, "unhandled C++ exception in GDCM");
    SWIG_fail;
  }
}

With this, the reproduction above raises a catchable MemoryError/RuntimeError instead of aborting — matching how pylibjpeg-openjpeg already behaves.

Since the .i files live upstream in GDCM (whose GitHub mirror is read-only / SourceForge-tracked), I'm also reporting this to the GDCM tracker; flagging here because python-gdcm is what Python users install and where the crash actually surfaces, and the binding/packaging fix is most actionable here. A complementary GDCM-core fix is to make a failed allocation a graceful return false from Bitmap::TryJPEG2000Codec (and siblings) — replacing the throwing gdcm_assert(len <= outbv->GetLength()) with if (!outbv || outbv->GetLength() < len) return false; and catching std::bad_alloc around the decode — so the decode fails recoverably rather than throwing at all.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions