Skip to content

AOCL 5.2.2 Release

Latest

Choose a tag to compare

@kvaragan kvaragan released this 20 Mar 06:31

AOCL-BLAS 5.2.2 Release Notes

Overview

AOCL-BLAS 5.2.2 is an incremental release building on the 5.2 GA release, delivering performance optimizations, bug fixes, improved threading stability, and expanded test coverage.

Performance Optimizations

  • Optimized SGEMM rd kernels on Zen3
  • Improved SGEMM rd kernel on Zen4/Zen5
  • SGEMM tiny path tuning for Zen4 and Zen5
  • Added tiny path for SGEMM
  • Added fast path for single-threaded AVX512 DGEMV kernel
  • Replaced intrinsics with inline assembly for bli_saxpyv_zen4_int and bli_saxpyf_zen_int_5
  • Improved fringe case handling for AXPYV kernel
  • Disabled small_gemm for Zen4/Zen5 and added single-thread check for tiny path

Bug Fixes

  • Fixed memory leak in DGEMV kernel
  • Fixed extreme values handling in GEMV
  • Fixed integer division in GEMV that was supposed to be a double operation
  • Fixed Integer Overflow issue in TPSV
  • Fixed out-of-bound access in F32 matrix add/mul ops
  • Bugfix: BF16 to F32 conversion in AVX2 F32 codepath
  • Bug fix in BF16 AVX2 conversion path
  • Fix for F32 to BF16 conversion and AVX512 ISA support checks
  • Fixed cblas_ctrmm invalid diag handling
  • Coverity issue fix for ZTRSM
  • Fixed Coverity static analysis issue in DTRSM
  • Fixed high priority Coverity issues in LPGEMM
  • Resolved operator precedence warning in Zen5 DCOMPLEX threshold logic
  • Modified AXPY kernel to ensure consistency of numerical results

Threading & Stability

  • Fixed data race in native code-path
  • Add OpenMP barrier before releasing threadinfo & global communicator to avoid race
  • Replaced OMP barrier with bli_thread_barrier and added similar fixes
  • Global communicator is now freed outside the parallel region
  • Thread: free global communicator after parallel region completes
  • Initialize mem_t structures safely and handle NULL communicator in threading
  • Fix DTL dynamic thread logging in BLAS operations
  • Added dynamic threads and actual threads in the DTL log of SAXPY
  • Enabled disable-sba-pools feature in AOCL-BLAS

Build System & Infrastructure

  • Updates to the build systems (CMake and Make) for LPGEMM compilation
  • CMake: Adding targets and aliases so that BLIS works with FetchContent
  • Set security flags default enable
  • DTL Windows getpid support
  • Add compiler information to make showconfig and bench_getlibraryInfo
  • Make all bench applications consistent
  • Standardize Zen kernel names

Test Suite (GTestSuite)

  • Added Banded API tests: gbmv, hbmv, sbmv, tbmv, tbsv

  • Added Packed API tests: hpmv, spmv, tpmv, tpsv, hpr, hpr2, spr, spr2

  • Added conjugate dot and ger IIT_ERS tests

  • Added data pool support

  • Moved data generator definitions to a cpp file

  • Computediff improvements

  • Fix in swap

  • Break up tests for better organization

  • Multiple miscellaneous test fixes

  • Code tidying