Skip to content

Commit 550b059

Browse files
committed
Merge branch 'wma/levels'
2 parents c9b1858 + c193ab0 commit 550b059

1 file changed

Lines changed: 278 additions & 5 deletions

File tree

spec/latest/index.bs

Lines changed: 278 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -55,8 +55,9 @@ outside of the "binsparse" namespace.
5555

5656
<div class=example>
5757

58-
Example of a JSON descriptor for a compressed-sparse column matrix with 10 rows
59-
and 12 columns, containing float32 values, along with user-defined attributes.
58+
Example of a JSON descriptor for a compressed-sparse column (CSC) matrix with 10
59+
rows and 12 columns, containing float32 values, along with user-defined
60+
attributes.
6061

6162
```json
6263
{
@@ -273,6 +274,278 @@ Pairs must not be duplicated.
273274

274275
Coordinate format is an alias for [[#coor_format]] format.
275276

277+
### Version 2.0 only: Custom Formats ### {#custom_formats}
278+
279+
The contents of this section will be finalized with the release of Binsparse
280+
V2.0, and are subject to change until then.
281+
282+
Binsparse describes custom multidimensional formats hierarchically. We can
283+
understand these formats as arrays of arrays, where the parent array and
284+
child arrays might use different formats. For example, we could have a dense
285+
outer array which contains sparse inner arrays, so the first index would be
286+
dense and the second index would be sparse. To achieve efficient storage, all
287+
arrays in the same level are stored contiguously in a specialized datastructure
288+
called a level.
289+
290+
A level is a collection of zero or more arrays which all have the same format.
291+
The elements of arrays in a level may be subarrays in a sublevel. The global
292+
array we wish to store is represented by a level that holds a single root array.
293+
294+
For example, the simplest level is the element format, which represents a
295+
collection of scalars. We can represent a collection of dense vectors with a
296+
dense level format. Each vector in the collection would be composed from
297+
contiguous scalars in an element level (analogously to the numpy.stack
298+
operator). We can represent a collection of sparse vectors using a sparse level.
299+
The sparse level format represents sparse vectors by listing the locations of
300+
nonzeros, and storing only the nonzero scalars inside an element level.
301+
302+
In addition to storing scalars, dense and sparse levels may themselves store
303+
multidimensional arrays. This leads to multiple ways to store sparse matrices
304+
and tensors. For example, a dense vector of sparse vectors is equivalent to the
305+
CSR matrix format, and a sparse vector of sparse vectors is equivalent to the
306+
hypersparse DCSR matrix format.
307+
308+
When defining a custom format, the outermost `subformat` key is defined as the
309+
root level descriptor (a level which will only hold one array). If a level holds
310+
many different arrays, we refer to the `p`th array as the array in position `p`.
311+
312+
Levels are row-major by default (adding an outer level adds a row dimension).
313+
The format descriptor may optionally define a `transpose` key, equal to a list of
314+
the described dimensions in the order they should appear. If the tensor we wish
315+
to represent is `A` and the tensor described by the format descriptor is `B`,
316+
then `A[i_1, ..., i_n] = B[i_(transpose[1]), ..., i_(transpose[n])]`. `transpose` must
317+
be a permutation.
318+
319+
If the format key is a dictionary, the `level` key must be present and shall
320+
describe the storage format of the level used to represent the sparse array.
321+
322+
The level descriptors are dictionaries defined as follows:
323+
324+
#### Element #### {#element_level}
325+
326+
If the level key is "element", the level represents zero or more scalars.
327+
328+
: values
329+
:: Array of size `number_of_positions` whose `p`th element holds the value of the scalar at position `p`.
330+
331+
#### Dense #### {#dense_level}
332+
333+
If the level key is "dense", the `subformat` key must be present. The `rank`
334+
key must be present, and set to an integer `r` greater than or equal to 1. The
335+
dense level represents zero or more r-dimensional dense arrays whose elements
336+
are themselves arrays specified by `subformat`. For example, a dense level
337+
of
338+
rank 2 represents a collection of dense matrices of subarrays.
339+
340+
Assuming that the level describes arrays of shape `I_0, ..., I_(N - 1)`, the
341+
array at position `p` in a dense level of rank `r` is an array whose slice
342+
343+
`A[i_0, ..., i_(r - 1), :, ..., :]`
344+
345+
is described by the row-major position
346+
347+
`q = (((((p * I_0) + i_0) * I_1) + i_1) * I_2 + i_2) * ... + i_(r - 1)`
348+
349+
of the sublevel.
350+
351+
#### Sparse #### {#sparse_level}
352+
353+
If the level key is "sparse", the `subformat` key must be present. The
354+
`rank` key must be present, and set to an integer `r` greater than or equal to
355+
`1`. The sparse level represents zero or more `r`-dimensional sparse arrays
356+
whose non-implicit elements are themselves arrays specified by `subformat`. For
357+
example, a sparse level of rank 1 represents a collection of sparse vectors of
358+
subarrays.
359+
360+
Assume that this level represents `n`-dimensional subarrays and the root array
361+
is `N`-dimensional. The sparse level implies the following binary arrays are
362+
present:
363+
364+
: pointers_to_(N - n)
365+
:: Array of size `number_of_positions + 1` whose 1st element is equal to `0` and whose `p + 1`th element is equal to the sum of `pointers_to_(N - n)[p]` and the number of explicitly represented slices in the `p`th position.
366+
367+
: indices_(N - n), ..., indices(N - n + r - 1)
368+
:: There are `r` such arrays. When `A[i_0, ..., i_(r - 1), :, ..., :]` is explicitly represented by the subarray in position `q`, `indices_(N-n+s)[q] = i_s`. The arrays must be ordered such that the tuples `(indices_(N-n)[q], ..., indices_(N-n+r-1)[q])` are unique and appear in lexicographic order for all `q` in each range `pointers_to_(N-n)[p] <= q < pointers_to_(N-n)[p + 1]`. This array must contain no other elements.
369+
370+
Special note: If the sparse level is the root level, the `pointers` array should
371+
be ommitted, as its first value will be `0` and its last value will be the
372+
length of any of the `indices` arrays in this level.
373+
374+
375+
### Equivalent Formats ### {#equivalent_formats}
376+
377+
The following formats are equivalent
378+
379+
#### DVEC #### {#dvec_format_equiv}
380+
381+
```json
382+
"format": {
383+
"subformat": {
384+
"level": "dense",
385+
"rank": 1,
386+
"subformat": {
387+
"level": "element",
388+
}
389+
}
390+
}
391+
```
392+
393+
#### DMATR #### {#dmatr_format_equiv}
394+
395+
```json
396+
"format": {
397+
"subformat": {
398+
"level": "dense",
399+
"rank": 1,
400+
"subformat": {
401+
"level": "dense",
402+
"rank": 1,
403+
"subformat": {
404+
"level": "element",
405+
}
406+
}
407+
}
408+
}
409+
```
410+
411+
#### DMATC #### {#dmatr_format_equiv}
412+
413+
```json
414+
"format": {
415+
"transpose": [1, 0],
416+
"subformat": {
417+
"level": "dense",
418+
"rank": 1,
419+
"subformat": {
420+
"level": "dense",
421+
"rank": 1,
422+
"subformat": {
423+
"level": "element",
424+
}
425+
}
426+
}
427+
}
428+
```
429+
430+
#### CVEC #### {#cvec_format_equiv}
431+
432+
```json
433+
"format": {
434+
"subformat": {
435+
"level": "sparse",
436+
"rank": 1,
437+
"subformat": {
438+
"level": "element",
439+
}
440+
}
441+
}
442+
```
443+
444+
#### CSR #### {#csr_format_equiv}
445+
446+
```json
447+
"format": {
448+
"subformat": {
449+
"level": "dense",
450+
"rank": 1,
451+
"subformat": {
452+
"level": "sparse",
453+
"rank": 1,
454+
"subformat": {
455+
"level": "element",
456+
}
457+
}
458+
}
459+
}
460+
```
461+
462+
#### CSC #### {#csc_format_equiv}
463+
464+
```json
465+
"format": {
466+
"transpose": [1, 0],
467+
"subformat": {
468+
"level": "dense",
469+
"rank": 1,
470+
"subformat": {
471+
"level": "sparse",
472+
"rank": 1,
473+
"subformat": {
474+
"level": "element",
475+
}
476+
}
477+
}
478+
}
479+
```
480+
481+
#### DCSR #### {#dcsr_format_equiv}
482+
483+
```json
484+
"format": {
485+
"subformat": {
486+
"level": "sparse",
487+
"rank": 1,
488+
"subformat": {
489+
"level": "sparse",
490+
"rank": 1,
491+
"subformat": {
492+
"level": "element",
493+
}
494+
}
495+
}
496+
}
497+
```
498+
499+
#### DCSC #### {#dcsc_format_equiv}
500+
501+
```json
502+
"format": {
503+
"transpose": [1, 0],
504+
"subformat": {
505+
"level": "sparse",
506+
"rank": 1,
507+
"subformat": {
508+
"level": "sparse",
509+
"rank": 1,
510+
"subformat": {
511+
"level": "element",
512+
}
513+
}
514+
}
515+
}
516+
```
517+
518+
#### COOR #### {#coor_format_equiv}
519+
520+
```json
521+
"format": {
522+
"subformat": {
523+
"level": "sparse",
524+
"rank": 2,
525+
"subformat": {
526+
"level": "element",
527+
}
528+
}
529+
}
530+
```
531+
532+
#### COOC #### {#cooc_format_equiv}
533+
534+
Column-wise Coordinate format
535+
536+
```json
537+
"format": {
538+
"transpose": [1, 0],
539+
"subformat": {
540+
"level": "sparse",
541+
"rank": 2,
542+
"subformat": {
543+
"level": "element",
544+
}
545+
}
546+
}
547+
```
548+
276549
Data Types {#key_data_types}
277550
----------------------------
278551

@@ -313,9 +586,9 @@ The following strings shall be used to describe data types:
313586
## Value Modifiers ## {#value_modifiers}
314587

315588
When the value array is meant to be reinterpreted before reading, a special bracket syntax is
316-
provided to indicate modifications to the underlying value array.
589+
provided to indicate modifications to the underlying element level.
317590

318-
### Sparse Array with Complex Values ### {#complex_arrays}
591+
### Complex Values (complex) ### {#complex_level}
319592

320593
When a value array is composed of alternating real and imaginary components of
321594
complex numbers, the type is written as `complex[<type>]`. For example, a value
@@ -326,7 +599,7 @@ the modified array shall be at position `2i + 1` in the underlying array.
326599
The `complex` value modifier may only be used with the types `float32` and
327600
`float64`.
328601

329-
### Sparse Array with All Values the Same ### {#iso_arrays}
602+
### All Values the Same (ISO) ### {#iso_level}
330603

331604
When all values of a sparse array are the same identical value, the type is
332605
written as `iso[<type>]`. This indicates that the array will store only a single

0 commit comments

Comments
 (0)