@@ -40,16 +40,18 @@ with a value as a *stored value*. Stored values have associated with them a
4040*scalar value*, which is the value stored in that location in the array, and one
4141or more *indices*, which describe the location where the stored value is located
4242in the array. Some or all of these indices may be stored explicitly, or they may
43- be implicitly derived, depending on the storage format. When stored explicitly,
43+ be implicitly derived, depending on storage format. When stored explicitly,
4444indices are 0-based positive integers.
4545
4646
4747Binsparse JSON Descriptors {#descriptor}
4848========================================
4949
50- Binsparse descriptors are JSON blobs that describe the binary format of sparse
51- data. The JSON blob includes several required keys that describe the structure of
52- the binary storage. Optional attributes may be defined to hold additional metadata.
50+ Binsparse descriptors are key-value metadata that describe the binary format of sparse
51+ data. The key-value data is namespaced as "binsparse" to avoid any conflict with other
52+ metadata in the container. The required entries in the "binsparse" entry are listed
53+ below. Optional attributes may be defined to hold additional metadata and must be stored
54+ outside of the "binsparse" namespace.
5355
5456<div class=example>
5557
@@ -59,17 +61,17 @@ attributes.
5961
6062```json
6163{
62- "format": "CSC",
63- "shape": [10, 12] ,
64- "data_types": {
65- "pointers_0": "uint64",
66- "indices_1": "uint64",
67- "values": "float32"
64+ "binsparse": {
65+ "format": "CSC",
66+ "shape": [10, 12] ,
67+ "data_types": {
68+ "pointers_0": "uint64",
69+ "indices_1": "uint64",
70+ "values": "float32"
71+ }
6872 },
69- "attrs": {
70- "original_source": "https://url/of/original/file.mtx",
71- "author": "John Doe"
72- }
73+ "original_source": "https://url/of/original/file.mtx",
74+ "author": "John Doe"
7375}
7476```
7577
@@ -82,7 +84,7 @@ The `shape` key must be present and shall define the shape of the sparse tensor.
8284It shall contain a JSON array of integers, with index `i` containing the size of
8385the `i`'th dimension. For matrices, index `0` shall contain the number of rows,
8486and index `1` shall contain the number of columns. For vectors, index `0` shall
85- contain the number of indices of the vector if it were dense .
87+ contain the vector's dimension .
8688
8789Note: a matrix has shape [`number_of_rows`, `number_of_columns`] regardless of whether
8890the format orientation is row-wise or column-wise.
@@ -99,7 +101,9 @@ in the binary storage container.
99101### Pre-defined Formats ### {#predefined_formats}
100102
101103The following is a list of all pre-defined formats and the arrays that shall
102- be present in the binary container.
104+ be present in the binary container. `number_of_elements` refers to the number
105+ of stored values, `number_of_rows` refers to the number of rows, and `number_of_columns`
106+ refers to the number of columns.
103107
104108#### VEC #### {#vec_format}
105109
@@ -110,7 +114,8 @@ Vector format
110114: values
111115:: Array of size `number_of_elements` containing stored values.
112116
113- Indices shall be sorted and must not be duplicated.
117+ The element of the vector located at index `indices_0[i] ` has scalar value
118+ `values[i] `. Elements shall be sorted by index and must not be duplicated.
114119
115120#### CSR #### {#csr_format}
116121
@@ -127,7 +132,7 @@ The column indices of the stored values located in row `i` are located in the ra
127132`[pointers_0[i] , pointers_0[i+1] )` in the `indices_1` array. The scalar values for
128133each of those stored values is stored in the corresponding index in the `values` array.
129134
130- Within a row, column indices shall be sorted and must not be duplicated.
135+ Within a row, elements shall be sorted by column index and must not be duplicated.
131136
132137#### CSC #### {#csc_format}
133138
@@ -144,7 +149,7 @@ The rows indices of the stored values located in column `j` are located in the r
144149`[pointers_0[j] , pointers_0[j+1] )` in the `indices_1` array. The scalar values for
145150each of those stored values is stored in the corresponding index in the `values` array.
146151
147- Within a column, row indices shall be sorted and must not be duplicated.
152+ Within a column, elements shall be sorted by row index and must not be duplicated.
148153
149154#### DCSR #### {#dcsr_format}
150155
@@ -164,8 +169,8 @@ DCSR is similar to CSR, except that rows which are entirely empty are not stored
164169contains no repeated values. Because the position within `pointers_0` no longer dictates the
165170corresponding row index, `indices_0` provides the row index.
166171
167- Within a row, column indices shall be sorted and must not be duplicated. Row indices shall be
168- sorted and must not be duplicated.
172+ Rows shall be sorted and must not be duplicated.
173+ Within each row, elements shall be sorted by column index and must not be duplicated.
169174
170175#### DCSC #### {#dcsc_format}
171176
@@ -185,8 +190,8 @@ DCSC is similar to CSC, except that columns which are entirely empty are not sto
185190contains no repeated values. Because the position within `pointers_0` no longer dictates the
186191corresponding column index, `indices_0` provides the column index.
187192
188- Within a column, row indices shall be sorted and must not be duplicated. Column indices shall be
189- sorted and must not be duplicated.
193+ Columns shall be sorted and not duplicated.
194+ Within each column, elements shall be sorted by row index and must not be duplicated.
190195
191196#### COOR #### {#coor_format}
192197
@@ -464,12 +469,12 @@ Data Types {#key_data_types}
464469----------------------------
465470
466471The `data_types` key must be present and shall define the data types of all required
467- arrays based on the [[#key_format]] . The data type declares the type of the
468- in-memory arrays. While these are often identical to the types used when storing
469- the arrays on disk in the container, the container may choose to store the arrays
470- in another format. For example, `uint64` type may be stored as `int8` if all the
471- numbers in the array are small enough to fit, but `data_types` would still list the
472- array as having type `uint64` .
472+ arrays based on the [[#key_format]] . The data type declares the type of both the
473+ on-disk array as well as the in-memory array. When these are identical, a simple string
474+ defines the type. When the on-disk and in-memory types differ due to limitations in the
475+ storage container (ex. HDF5 lacks a BOOL type), the type is shown as "on_disk->in_memory".
476+ For the example of storing BOOL type as INT8, this would be "int8->bool" to indicate that
477+ after reading the values array into memory, it should be interpreted as boolean data .
473478
474479For a given [[#key_format]] , all named binary arrays for that format shall have a
475480corresponding name in `data_types`.
@@ -496,16 +501,16 @@ Example of a CSR Matrix whose values are all 1.
496501 <td> .</td>
497502 <td> .</td>
498503 <td> .</td>
499- <td> 1 </td>
504+ <td> 7 </td>
500505 <td> .</td>
501506 </tr>
502507 <tr>
503508 <th> 1</th>
504509 <td> .</td>
505- <td> 1 </td>
510+ <td> 7 </td>
506511 <td> .</td>
507512 <td> .</td>
508- <td> 1 </td>
513+ <td> 7 </td>
509514 </tr>
510515 <tr>
511516 <th> 2</th>
@@ -518,8 +523,8 @@ Example of a CSR Matrix whose values are all 1.
518523 <tr>
519524 <th> 3</th>
520525 <td> .</td>
521- <td> 1 </td>
522- <td> 1 </td>
526+ <td> 7 </td>
527+ <td> 7 </td>
523528 <td> .</td>
524529 <td> .</td>
525530 </tr>
@@ -528,7 +533,7 @@ Example of a CSR Matrix whose values are all 1.
528533 <td> .</td>
529534 <td> .</td>
530535 <td> .</td>
531- <td> 1 </td>
536+ <td> 7 </td>
532537 <td> .</td>
533538 </tr>
534539 </tbody>
@@ -558,7 +563,7 @@ Example of a CSR Matrix whose values are all 1.
558563
559564- `pointers_0` = [0, 1, 3, 3, 5, 6]
560565- `indices_1` = [3, 1, 4, 1, 2, 3]
561- - `values` = [1 ]
566+ - `values` = [7 ]
562567
563568</div>
564569
0 commit comments