Skip to content

Interpreting NATIVE_UINT8 as integer in sparse matrices #56

@LTLA

Description

@LTLA

Sometimes I store sparse counts in the HDF5 file as unsigned 8-bit integers to save some space. This is fine but the subsequent H5SparseMatrix instance is not able to participate in arithmetic operations:

library(rhdf5)
y <- abs(round(Matrix::rsparsematrix(100, 100, 0.01) * 10))

tmp <- tempfile(fileext=".h5")
h5createFile(tmp)
h5createGroup(tmp, "matrix")
h5createDataset(tmp, "matrix/data", length(y@x), H5type="H5T_NATIVE_UINT8")
h5write(y@i, tmp, "matrix/indices", length(y@i))
h5write(y@p, tmp, "matrix/indptr", length(y@p))

fhandle <- H5Fopen(tmp)
ghandle <- H5Gopen(fhandle, "matrix")
h5writeDataset(y@x, ghandle, "data")
H5Gclose(ghandle)
H5Fclose(fhandle)

library(HDF5Array)
seed <- H5SparseMatrixSeed(tmp, "matrix", dim=c(100, 100), sparse.layout="csc")
mat <- DelayedArray(seed)
type(mat)
## [1] "raw"

mat + 1
## Error in h(simpleError(msg, call)) :
##   error in evaluating the argument 'x' in selecting a method for function 'type': non-numeric argument to binary operator

This might be easily solved with a type= option, just like in the HDF5ArraySeed constructor for the dense case.

Or even better, a dedicated as.integer= option that treats all HDF5 integer types as R integers. This would allow me to just set as.integer=TRUE and everything should work; otherwise, even with a type= option, I need to first create the HDF5ArraySeed with default arguments, check if it's type(mat) == "raw", and then create it again with type="integer". (Presumably, I can't just set type="integer" all the time, otherwise bad things will happen for floating-point matrices.)

Session information
R version 4.3.0 Patched (2023-05-04 r84398)
Platform: aarch64-apple-darwin22.3.0 (64-bit)
Running under: macOS Ventura 13.2.1

Matrix products: default
BLAS:   /Users/luna/Software/R/R-4-3-branch/lib/libRblas.dylib
LAPACK: /Users/luna/Software/R/R-4-3-branch/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] HDF5Array_1.29.3      DelayedArray_0.27.9   SparseArray_1.1.10
 [4] S4Arrays_1.1.4        IRanges_2.35.2        S4Vectors_0.39.1
 [7] MatrixGenerics_1.13.0 matrixStats_1.0.0     BiocGenerics_0.47.0
[10] Matrix_1.5-4.1        rhdf5_2.45.0

loaded via a namespace (and not attached):
[1] zlibbioc_1.47.0     lattice_0.21-8      rhdf5filters_1.13.3
[4] XVector_0.41.1      Rhdf5lib_1.23.0     grid_4.3.0
[7] compiler_4.3.0      tools_4.3.0         crayon_1.5.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions