RandBLAS is a header-only C++ library for sketching in randomized linear algebra. It provides BLAS-like functionality for applying random sketching operators to dense and sparse matrices, enabling efficient implementation of randomized algorithms like low-rank SVD, least squares, and other dimension reduction techniques.
Core Purpose: Enable efficient, flexible, and reliable sketching operations for randomized numerical linear algebra on CPUs using OpenMP parallelization.
- Web Documentation: https://randblas.readthedocs.io/en/1.1.0/
- Main Repository: https://github.com/BallisticLA/RandBLAS
- DevNotes: Critical implementation details are in
RandBLAS/DevNotes.md,RandBLAS/sparse_data/DevNotes.md, andtest/DevNotes.md
RandBLAS/ # Core library headers (header-only)
├── base.hh # Basic types, Random123 wrappers
├── random_gen.hh # RNG infrastructure
├── dense_skops.hh # Dense sketching operators (DenseSkOp)
├── sparse_skops.hh # Sparse sketching operators (SparseSkOp)
├── skge.hh # sketch_general() - main entry point for sketching
├── sksy.hh # Symmetric sketching
├── skve.hh # Vector sketching
├── util.hh # Utility functions
└── sparse_data/ # Sparse matrix abstractions and operations
├── base.hh # Sparse matrix types (COO, CSR, CSC)
├── coo_matrix.hh # Coordinate format
├── csr_matrix.hh # Compressed sparse row
├── csc_matrix.hh # Compressed sparse column
├── conversions.hh # Format conversions
├── spmm_dispatch.hh # Sparse matrix-matrix multiply dispatch
├── coo_spmm_impl.hh # COO kernel implementations
├── csr_spmm_impl.hh # CSR kernel implementations
├── csc_spmm_impl.hh # CSC kernel implementations
├── trsm_dispatch.hh # Sparse triangular solve dispatch
├── csr_trsm_impl.hh # CSR triangular solve implementations
├── csc_trsm_impl.hh # CSC triangular solve implementations
└── sksp.hh # Sketching sparse data with dense operators
test/ # GoogleTest-based test suite
├── test_basic_rng/ # RNG tests (deterministic and statistical)
├── test_datastructures/ # Tests for DenseSkOp, SparseSkOp, sparse matrices
├── test_matmul_cores/ # Low-level kernel tests (lskge3, rskge3, left_spmm, right_spmm)
├── test_matmul_wrappers/ # High-level API tests (sketch_general, sketch_sparse, etc.)
└── test_sparse_trsm/ # Sparse triangular solve tests
examples/ # Example programs demonstrating usage
├── sparse-data-matrices/ # Sparse matrix performance benchmarks
├── sparse-low-rank-approx/ # Sparse SVD/QRCP examples
└── total-least-squares/ # TLS with dense and sparse operators
rtd/ # ReadTheDocs source files
-
Random Number Generation: Uses Random123 for counter-based PRNGs with thread-safe, reproducible parallel generation.
-
BLAS Portability: Uses BLAS++ as the portability layer. Main BLAS functions used: GEMM, GEMV, SCAL, COPY, AXPY.
-
Sketching Operators:
DenseSkOp: Gaussian, uniform, sparse (CountSketch variants)SparseSkOp: Sparse random matrices in COO format internally
-
Main API Functions:
sketch_general(): Entry point for sketching dense data (routes tolskge3,rskge3,lskges,rskges)left_spmm(),right_spmm(): Sparse matrix × dense matrix multiplicationsketch_sparse(): Sketching sparse data with dense operatorstrsm(): Sparse triangular solve (B ← αA⁻¹B for triangular sparse matrix A)
IMPORTANT: RandBLAS guarantees that randomly generated matrices are identical regardless of the number of threads used. This is achieved through careful management of RNG state in dense_skops.hh and sparse_skops.hh.
- When modifying sampling code, preserve this thread-independence property
- Tests verify this property; always run tests after RNG-related changes
The sparse matrix multiplication functions (left_spmm, right_spmm) use a 12-codepath dispatch system based on:
- Matrix format (COO, CSR, CSC)
- Transposition flags (
opA,opB) - Memory layout (RowMajor, ColMajor)
See RandBLAS/sparse_data/DevNotes.md for the full dispatch flow. Key points:
right_spmmreduces toleft_spmmby flipping flags- Transposition of sparse matrices creates lightweight views (CSR ↔ CSC)
- All 12 codepaths should be covered by tests
The sparse triangular solve function (trsm) handles transposition similarly:
- Transposition creates lightweight CSR ↔ CSC views and flips uplo (upper ↔ lower)
- Supports both unit and non-unit diagonal matrices
- Includes validation modes for checking structural properties
lskgeX: Left sketching, general (dense) data, variant XrskgeX: Right sketching, general (dense) data, variant Xlskges: Left sketching with sparse operatorlskge3: Left sketching with dense operator (calls GEMM, "3" for 3-argument GEMM-like)lsksp3: Left sketching sparse data (where "left" refers to operator position)
Counterintuitive detail: In lsksp3 and rsksp3, the "left/right" refers to the operator's position. But these functions call right_spmm/left_spmm respectively, where "left/right" refers to the sparse data matrix position. See sparse_data/DevNotes.md lines 59-74.
- C++20 required (uses concepts for type constraints)
- Some compilers (e.g., gcc 8.5) may need
-fconceptsflag - macOS may need
-D __APPLE__forsincosf/sincosfunctions
- Header-only library: all code in
.hhfiles - Use BLAS++ enumerations (
blas::Layout,blas::Op, etc.) extensively - Follow existing patterns for GEMM-like APIs (side flags, transposition, layouts)
- Prefer templates with C++20 concepts over traditional template metaprogramming
- OpenMP is critical for performance (both dense operator sampling and sparse operations)
- Fast GEMM is essential for dense sketching operations
- Sparse matrix kernels are hand-tuned; changes should be benchmarked
- BLAS++ configuration significantly affects performance
Before making performance optimizations:
- Ask for confirmation on approach
- Benchmark before and after
- Document performance implications in PR/commit message
Always run ctest after making code changes, especially:
- Any modifications to sampling logic (dense or sparse operators)
- Changes to sparse matrix kernels or dispatch logic
- RNG-related changes
- New feature implementations
Tests are organized by abstraction level:
test_basic_rng/: RNG correctness (deterministic and statistical)test_datastructures/: Data structure constructors, accessors, format conversionstest_matmul_cores/: Low-level kernels (lskge3,left_spmm, etc.)test_matmul_wrappers/: High-level API (sketch_general,sketch_sparse,sketch_symmetric)test_sparse_trsm/: Sparse triangular solve
cd RandBLAS-build
ctest # Run all tests
ctest -R test_name # Run specific test
ctest -V # Verbose output- Use GoogleTest framework
- Follow patterns in existing test files
- For new sketching operators: test both left and right application, various transposition flags
- For sparse operations: ensure all relevant codepaths are covered
Key CMake variables:
blaspp_DIR: Path to BLAS++ installation (containingblasppConfig.cmake)Random123_DIR: Path to Random123 headersCMAKE_BUILD_TYPE: Release or DebugCMAKE_CXX_FLAGS: May need-D __APPLE__on macOS
mkdir RandBLAS-build && cd RandBLAS-build
cmake -DCMAKE_BUILD_TYPE=Release \
-Dblaspp_DIR=/path/to/blaspp-install/lib/cmake/blaspp/ \
-DRandom123_DIR=/path/to/random123-install/include/ \
../RandBLAS/
make -j install
ctestSee INSTALL.md for full details.
- Source code comments: Inline documentation in headers
- Web documentation: Tutorial and API reference at readthedocs.io
- DevNotes: Implementation details not suitable for user guide
- Code comments: Update when changing function signatures or behavior
- DevNotes: Update when implementation approach changes significantly
- Web docs: Source in
rtd/directory, built with Sphinx, deployed to ReadTheDocs
- Concise explanations: Focus on what changed and why
- Reference line numbers when discussing specific code locations (e.g.,
file.hh:42) - Use mathematical notation when helpful but don't over-explain standard linear algebra concepts
- Link to relevant sections of web docs or DevNotes for deeper context
- Follow conventional commit style where appropriate
- For complex changes, explain the "why" not just the "what"
- Reference issue numbers when applicable
main: Primary development branch- Version tags:
X.Y.Zformat (e.g.,1.1.0)
See PROCEDURES.md for full release procedures. Key steps:
- Create git tag in
X.Y.Zformat - Write release notes
- Update ReadTheDocs default version
- Create GitHub release
GitHub Actions workflows test:
- Ubuntu (OpenMP)
- macOS (serial and OpenMP, current and older versions)
All CI tests must pass before merging.
- Always run tests after making code changes to verify correctness
- Ask before performance optimizations - confirm approach and document changes
- Concise explanations - briefly explain what changed and why, without excessive detail
- Reference files with line numbers using format
[file.hh:42](RandBLAS/file.hh#L42)
Adding a new sketching operator:
- Define operator struct in
dense_skops.hhorsparse_skops.hh - Implement sampling logic (preserve thread-independence!)
- Add
sketch_generalsupport inskge.hh - Add tests in
test_datastructuresandtest_matmul_wrappers - Update web documentation in
rtd/ - Run full test suite
Modifying sparse matrix kernels:
- Review dispatch logic in
sparse_data/DevNotes.md - Make changes to specific kernel in
coo_spmm_impl.hh,csr_spmm_impl.hh, orcsc_spmm_impl.hh - Verify all affected codepaths have test coverage
- Run tests and benchmarks
- Document performance implications
Improving code quality:
- Identify area for refactoring
- Review existing tests to understand expected behavior
- Make incremental changes, running tests after each step
- Consider performance implications
- Update DevNotes if implementation approach changes
- Don't change RNG behavior without carefully preserving thread-independence
- Don't optimize sparse kernels without benchmarking
- Don't add BLAS++ dependencies beyond current subset without discussion
- Don't break CMake configuration for downstream projects
- Don't assume all 12 sparse matrix codepaths are tested (verify coverage)
- BLAS++ (blaspp): BLAS portability layer - must be built with CMake
- Random123: Header-only RNG library
- C++20 compiler: gcc ≥9, clang ≥10, or equivalent
- GoogleTest: Required for testing (
ctest) - OpenMP: Required for performance (parallel sampling and sparse operations)
- LAPACK++ (lapackpp): Often used in projects that depend on RandBLAS
- BLAS++ configuration heavily affects performance - users should inspect CMake output
- Random123 headers must be in include path for downstream projects
- OpenMP detection can fail on macOS with default system compilers (use homebrew gcc/clang)
- No known security vulnerabilities
- Primary correctness concerns: RNG reproducibility, numerical accuracy, thread safety
- Statistical tests verify RNG quality (Kolmogorov-Smirnov tests for distribution correctness)
- Deterministic tests compare against reference values for Random123 generators
- GitHub Issues: https://github.com/BallisticLA/RandBLAS/issues
- Documentation: https://randblas.readthedocs.io/
- Contact: Project maintainers listed in repository
This CLAUDE.md file was created to help Claude Code understand the RandBLAS project structure, conventions, and workflows. Update it as the project evolves.