AOCL-BLAS 5.2.2 Release Notes
Overview
AOCL-BLAS 5.2.2 is an incremental release building on the 5.2 GA release, delivering performance optimizations, bug fixes, improved threading stability, and expanded test coverage.
Performance Optimizations
- Optimized SGEMM rd kernels on Zen3
- Improved SGEMM rd kernel on Zen4/Zen5
- SGEMM tiny path tuning for Zen4 and Zen5
- Added tiny path for SGEMM
- Added fast path for single-threaded AVX512 DGEMV kernel
- Replaced intrinsics with inline assembly for
bli_saxpyv_zen4_intandbli_saxpyf_zen_int_5 - Improved fringe case handling for AXPYV kernel
- Disabled small_gemm for Zen4/Zen5 and added single-thread check for tiny path
Bug Fixes
- Fixed memory leak in DGEMV kernel
- Fixed extreme values handling in GEMV
- Fixed integer division in GEMV that was supposed to be a double operation
- Fixed Integer Overflow issue in TPSV
- Fixed out-of-bound access in F32 matrix add/mul ops
- Bugfix: BF16 to F32 conversion in AVX2 F32 codepath
- Bug fix in BF16 AVX2 conversion path
- Fix for F32 to BF16 conversion and AVX512 ISA support checks
- Fixed
cblas_ctrmminvalid diag handling - Coverity issue fix for ZTRSM
- Fixed Coverity static analysis issue in DTRSM
- Fixed high priority Coverity issues in LPGEMM
- Resolved operator precedence warning in Zen5 DCOMPLEX threshold logic
- Modified AXPY kernel to ensure consistency of numerical results
Threading & Stability
- Fixed data race in native code-path
- Add OpenMP barrier before releasing threadinfo & global communicator to avoid race
- Replaced OMP barrier with
bli_thread_barrierand added similar fixes - Global communicator is now freed outside the parallel region
- Thread: free global communicator after parallel region completes
- Initialize
mem_tstructures safely and handle NULL communicator in threading - Fix DTL dynamic thread logging in BLAS operations
- Added dynamic threads and actual threads in the DTL log of SAXPY
- Enabled disable-sba-pools feature in AOCL-BLAS
Build System & Infrastructure
- Updates to the build systems (CMake and Make) for LPGEMM compilation
- CMake: Adding targets and aliases so that BLIS works with
FetchContent - Set security flags default enable
- DTL Windows
getpidsupport - Add compiler information to
make showconfigandbench_getlibraryInfo - Make all bench applications consistent
- Standardize Zen kernel names
Test Suite (GTestSuite)
-
Added Banded API tests: gbmv, hbmv, sbmv, tbmv, tbsv
-
Added Packed API tests: hpmv, spmv, tpmv, tpsv, hpr, hpr2, spr, spr2
-
Added conjugate dot and ger IIT_ERS tests
-
Added data pool support
-
Moved data generator definitions to a cpp file
-
Computediff improvements
-
Fix in swap
-
Break up tests for better organization
-
Multiple miscellaneous test fixes
-
Code tidying