Skip to content

Native Windows (MSYS2/MinGW-w64) build for ABACUS#7423

Open
ErjieWu wants to merge 19 commits into
deepmodeling:developfrom
ErjieWu:windows
Open

Native Windows (MSYS2/MinGW-w64) build for ABACUS#7423
ErjieWu wants to merge 19 commits into
deepmodeling:developfrom
ErjieWu:windows

Conversation

@ErjieWu
Copy link
Copy Markdown
Collaborator

@ErjieWu ErjieWu commented Jun 2, 2026

Summary

This PR adds the ability to build and run ABACUS natively on Windows (without WSL), and fixes several latent bugs uncovered along the way. The native build supports plane-wave (PW) and numerical-atomic-orbital (LCAO) bases, in both serial and MPI (MS-MPI) configurations, using the MSYS2/MinGW-w64 GCC toolchain with OpenBLAS, FFTW, and ScaLAPACK.

The guiding principle throughout: do not change working Linux code. Every Windows-specific change is isolated behind #if defined(_WIN32) (or a macro that expands to the exact upstream token on non-Windows), so Linux/macOS builds are byte-for-byte unchanged. The only exceptions are genuine bug fixes (noted below), which improve correctness on all platforms.

New feature: native Windows build

Rather than introducing bespoke Windows scripts, the build plugs into the existing toolchain/ infrastructure and the existing integration-test harness, mirroring the other backends (gnu, intel, gcc-mkl, …):

  • toolchain/toolchain_windows.sh — installs the prerequisites via MSYS2 pacman (gcc/gfortran, cmake, ninja, openblas, fftw, cereal, ms-mpi, scalapack, bc).
  • toolchain/build_abacus_windows.sh — configures and builds (default: MPI + LCAO). It bundles the dependent DLLs next to abacus.exe (so the binary is self-contained), drops an mpirun→mpiexec shim for the test harness, and caps default build parallelism by available RAM to avoid cc1plus OOM.
cd toolchain
./toolchain_windows.sh
./build_abacus_windows.sh        # default: MPI + LCAO; -j N to override
source abacus_env.sh
abacus                           # serial run
./build_abacus_windows.sh        # default: MPI + LCAO; -j N to override
source abacus_env.sh
abacus                           # serial run
mpiexec -n 4 abacus              # parallel run (MS-MPI)

Testing uses the standard harness unchanged — no separate Windows test script and no separate case list:

cd tests/01_PW
bash ../integrate/Autotest.sh -a abacus       # MPI (default np=4)
bash ../integrate/Autotest.sh -a abacus -n 0  # serial (no MPI launcher)

Three Windows-specific gaps were closed so the unmodified harness can drive MS-MPI: MS-MPI ships only mpiexec (not mpirun); MSYS2's OpenBLAS is OpenMP-threaded so OMP_NUM_THREADS (not OPENBLAS_NUM_THREADS) must be pinned to avoid its allocator failing under multiple ranks; and mpiexec doesn't propagate PATH to child ranks when stdout is redirected, which the bundled DLLs solve.

Build-system / portability changes (Linux unaffected)

  • CMakeLists.txt — WIN32/MSVC-guarded defines (_USE_MATH_DEFINES, NOMINMAX, _CRT_SECURE_NO_WARNINGS); skip -O3 -g/-lm/the post-install symlink on MSVC/Windows; require ScaLAPACK only when ENABLE_MPI (a serial build is distributed-memory-free — no effect on MPI builds).
  • cmake/FindBlas.cmake, FindLapack.cmake — save/restore CMAKE_MODULE_PATH around the builtin find_package so the wrapper doesn't recurse into itself on case-insensitive filesystems (Windows/macOS). No-op on Linux.
  • source/source_base/fs_compat.h (new) — a portable make_directory() (_mkdir on Windows, mkdir(path, 0755) on POSIX), since POSIX mkdir's mode argument doesn't exist in the Windows CRT. The POSIX path is behaviorally identical to the previous call sites.
  • cpu_allocator.cpp (_aligned_malloc/_aligned_free), restart.cpp (_S_IREAD/_S_IWRITE + <io.h>), input_conv.h (<regex.h> → C++ , which MinGW provides) — all needed because the corresponding POSIX facilities are absent on the Windows CRT.
  • source/source_base/module_fft/fft_base.h, fft_cpu.h — the FFT virtuals use attribute((weak)) so the ELF linker can null the unused FFT_CPU vtable slots when ENABLE_FLOAT_FFTW is off. MinGW/PE has no working equivalent (weak template members collide or null the slot). Introduced ABACUS_FFT_WEAK (= attribute((weak)) on non-Windows, empty on _WIN32); on Windows the build sets ENABLE_FLOAT_FFTW=ON so the real definitions exist, and the non-pure base virtuals get trivial bodies in a #if defined(_WIN32) block. Linux preprocesses to exactly the upstream headers.

Bug fixes

These are genuine correctness fixes (not Windows-only workarounds):

  • structure_factor.cpp (bspline_sf) — in a serial build, zpiece_to_all() is MPI-only, leaving the real-space tmpr array uninitialized → garbage structure factor and a wrong total energy. Filled directly in the #else (non-MPI) branch using the same layout. Affects serial builds only; the MPI path is untouched. Fix Test case 01_PW/032_PW_15_CF_CS_bspline failed for serial version ABACUS. #7422.
  • psi_initializer.cpp (random_t) — in a serial build, stick_to_pool() is MPI-only, so seeded (pw_seed>0) random wavefunctions came out all-zero and later tripped Gram-Schmidt (psi_norm <= 0). Copied the stick data directly in the #else branch. Serial only; MPI path untouched.
  • esolver_ks_lcao.cpp — guarded a dereference of the DeePKS overlap_orb_alpha integrator, which is a null unique_ptr when DeePKS is disabled (undefined behavior). Harmless when DeePKS is on.
  • binstream.cpp — Binstream is always a binary stream, but on Windows fopen mode "r"/"w" opens in text mode and corrupts binary wavefunction/charge files. Append "b" (a no-op on POSIX).
  • input_conv.h (parse_expression) — initialize the accumulator and fail fast (WARNING_QUIT) on an unmatched token or a failed numeric conversion, instead of pushing an indeterminate value into the result vector.

Test harness

  • tests/integrate/Autotest.sh — added a serial mode (-n 0) that runs the binary directly with no MPI launcher, so a serial build reuses the standard harness. Existing np > 0 (CI) behavior is unchanged.
  • .gitattributes — force LF endings for .sh and CASES_.txt so they parse correctly under bash on Windows checkouts. No content change on Linux.

Scope / not included

The following remain disabled on Windows (excluded by design, not regressions): ELPA, PEXSI, hybrid functionals (LibRI/LibComm), DeePKS/ML-KEDF, LibXC (so meta-GGA/SCAN), GPU (CUDA/ROCm), DSP. Test cases that require them are expected to fail.

Known limitations

  • Serial gamma-only LCAO converges to a wrong energy (a serial-only reduction gap in the gamma H/density assembly, not yet located). The MPI build is correct (matches references to ~1e-11 even on a single rank), so gamma-only LCAO should be run under MPI. Multi-k serial LCAO is unaffected.
  • pw_seed is not bit-reproducible across platforms — the random initializer uses C std::rand, whose sequence is implementation-defined (RAND_MAX differs between the Windows CRT and glibc). Init-sensitive cases may converge to a different, equally valid near-degenerate state.

Testing

  • Windows (MS-MPI, default harness): representative cases from 01_PW, 02_NAO_Gamma (gamma-only LCAO), and 03_NAO_multik pass. Across the full 01_PW suite, total energies match the Linux references to ~1e-8 relative; the residual harness "failures" are cross-platform/cross-BLAS float on forces/stresses exceeding the harness's strict absolute thresholds, excluded features (LibXC/etc.), gauge-dependent output-file dumps, and one stale reference (074, which is platform-independent) — not computational errors.
  • Linux: the Windows-specific changes are all _WIN32-guarded or expand to the exact upstream token; verified the FFT headers preprocess to the upstream source with -U_WIN32, and the serial fixes live only in #else/#ifndef __MPI branches, so standard Linux MPI builds are unchanged.

ErjieWu and others added 10 commits June 2, 2026 20:17
Lay the groundwork for a native Windows serial plane-wave build
(no MPI, no LCAO, no ELPA/PEXSI/hybrid). Targets MinGW-w64 GCC, which
ships the POSIX headers ABACUS uses and accepts its GCC attributes, so
the source needs only minimal, Linux-safe portability shims.

- source_base/fs_compat.h (new): portable ModuleBase::make_directory()
  wrapping _mkdir (Windows) / mkdir(path,0755) (POSIX). The Windows CRT
  mkdir takes no permission-mode argument.
- global_file.cpp, global_function.cpp: route the 7 mkdir(path,0755)
  call sites through the helper; drop unistd.h/sys/stat.h includes.
- CMakeLists.txt:
  * gate find_package(ScaLAPACK REQUIRED) on ENABLE_MPI so the serial
    build does not require a distributed-memory library;
  * define _USE_MATH_DEFINES/NOMINMAX/_CRT_SECURE_NO_WARNINGS on WIN32;
  * skip -O3 -g default flags and the -lm link for MSVC;
  * skip the post-install abacus symlink on Windows.
- tools/windows/build-native-serial.ps1 (new): MinGW configure/build helper.
- docs/advanced/install_windows_native.md (new): native-build documentation.

All changes are guarded or platform-neutral; the Linux build is unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
With these fixes the native Windows serial plane-wave build
(abacus_pw_ser.exe, MinGW-w64 GCC + OpenBLAS + FFTW) compiles, links,
and runs examples/02_scf/01_pw_Si2 to SCF convergence with a
deterministic total energy (-215.5057 eV, bit-identical across runs).

Build-system fixes:
- cmake/FindBlas.cmake, cmake/FindLapack.cmake: the wrappers delegate to
  CMake's builtin FindBLAS/FindLAPACK, but on the case-insensitive Windows
  filesystem the wrapper matched itself and recursed forever. Drop our
  module dir from CMAKE_MODULE_PATH around the builtin call (no-op on Linux).

Source portability fixes (all guarded or platform-neutral; Linux unaffected):
- module_fft/fft_base.h, fft_cpu.h: remove __attribute__((weak)) from the FFT
  virtuals. The weak-without-definition pattern relied on the ELF linker
  resolving unbound weak symbols to null; on Windows/PE (MinGW) it produced
  null vtable slots, so the first FFT dispatch (FFT_Bundle::setupFFT) called
  address 0 and segfaulted. Base virtuals get trivial default bodies; the
  float overrides become concrete via ENABLE_FLOAT_FFTW=ON.
- module_parameter/input_conv.h: port the POSIX <regex.h> expression parser to
  C++ <regex> (MinGW has no <regex.h>).
- module_container/base/core/cpu_allocator.cpp: replace posix_memalign with
  _aligned_malloc/_aligned_free on Windows, applied consistently to both
  allocate overloads and free.
- module_restart/restart.cpp: map POSIX S_IRUSR/S_IWUSR to _S_IREAD/_S_IWRITE
  and include <io.h> for low-level open/read/write/close on Windows.

Tooling/docs:
- tools/windows/build-native-serial.ps1: use the verified flags
  (BLA_VENDOR=OpenBLAS, ENABLE_FLOAT_FFTW=ON, COMMIT_INFO=OFF, the GCC-16
  force-include workaround).
- docs/advanced/install_windows_native.md: document the gcc-fortran package,
  the verified build/run, and every source change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
psi_initializer::random_t, in the pw_seed>0 branch, generates per-stick
random amplitude/phase into stickrr/stickarg and then distributes them
into the gathered tmprr/tmparg arrays via stick_to_pool() -- but that call
is guarded by #ifdef __MPI. In a serial build tmprr/tmparg therefore stay
zero-initialized, so every seeded random wavefunction is all-zero. This
later trips Gram-Schmidt orthonormalization ("psi_norm <= 0.0") and aborts
the run. The path is never hit in CI because the integration tests run
under MPI.

Add the serial counterpart: copy each stick directly into tmprr/tmparg
using the same mapping as stick_to_pool()'s rank-0 branch
(out[ixy2is_[ir]*nz + iz] = stick[iz]). ixy2is_ is populated for both
serial and MPI builds via pw_wfc_->getfftixy2is().

Verified on a representative set of 15 tests/01_PW cases run with the
native Windows serial PW build (abacus_pw_ser.exe): all converged total
energies now match the official result.ref references to <= ~7e-7 eV.
Before this fix the 6 cases using pw_seed with random wavefunctions
aborted; the other 9 already matched to ~1e-9 eV.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ke scripts

Per review feedback, the native-Windows support should plug into ABACUS's
existing build/test infrastructure (like any other backend/variant) rather than
carry its own scripts.

Build: add a Windows toolchain variant, mirroring toolchain_gnu.sh /
build_abacus_gnu.sh:
- toolchain/toolchain_windows.sh   -- installs the MinGW-w64 prerequisites via
  pacman on MSYS2 (gcc, gfortran, openblas, fftw, cmake, ninja) plus bc for the
  test harness; records the prefix in install/setup like the Linux variants.
- toolchain/build_abacus_windows.sh -- configures + builds the serial PW binary
  (ENABLE_MPI/LCAO=OFF, OpenBLAS+FFTW) and writes abacus_env.sh.
Removed the one-off tools/windows/build-native-serial.ps1.

Test: reuse tests/integrate/Autotest.sh instead of a separate script. Added a
serial mode: with -n 0 the harness runs the binary directly (no mpirun), so a
serial build (any OS) reuses the standard catch_properties.sh / result.ref
comparison. Added tests/integrate/CASES_SERIAL_PW.txt listing serial-PW cases.

Validation (build_abacus_windows.sh, then Autotest.sh -n 0 -f CASES_SERIAL_PW.txt):
all 15 01_PW cases run; total energies/forces/stresses match the Linux
result.ref to ~1e-7 relative. The few WARNINGs (016/017 etot ~1e-7 eV;
003/009/019 stress/force) are absolute-threshold exceedances from cross-platform
/ cross-BLAS floating point, classified WARNING (not ERROR) by the harness.

docs/advanced/install_windows_native.md updated to describe the toolchain +
serial-Autotest flow.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per review: the serial PW build should be checked against the existing PW test
suite (tests/01_PW) via the standard harness, not a hand-picked subset.

- Remove tests/integrate/CASES_SERIAL_PW.txt. The canonical list already exists
  at tests/01_PW/CASES_CPU.txt and is used by the standard ctest registration
  (tests/01_PW/CMakeLists.txt runs Autotest.sh from that directory). Serial runs
  just add -n 0:
      cd tests/01_PW
      bash ../integrate/Autotest.sh -a <abacus_pw_ser.exe> -n 0
- .gitattributes: force LF for *.sh and CASES_*.txt so the toolchain scripts,
  Autotest.sh and the bash-parsed case lists work on a fresh Windows checkout
  (core.autocrlf would otherwise rewrite them to CRLF).
- docs/advanced/install_windows_native.md: document the whole-01_PW serial run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirror the Linux toolchain UX: `source abacus_env.sh` then run `abacus`.

build_abacus_windows.sh now copies the configured binary (abacus_pw_ser.exe)
to abacus.exe in the build dir. Native Windows symlinks need elevation (so the
CMake `abacus` symlink step is skipped on WIN32); the .exe copy lets a bare
`abacus` resolve in the MSYS2 shell and in cmd/PowerShell. abacus_env.sh already
puts that directory (and the MinGW runtime DLLs via the toolchain setup) on PATH.

Verified: source abacus_env.sh; abacus --version  -> runs from any directory.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Binstream::Binstream/open pass the caller's fopen mode ("r"/"w"/"a")
straight through. On Windows that opens in *text* mode, which translates
CRLF and treats 0x1A as EOF, corrupting the binary wavefunction/charge
files Binstream is built to read -> "Error in Binstream: Some data didn't
be read". On POSIX "r" == "rb", so the bug is Windows-only.

Binstream is always a binary stream, so append "b" to the mode when the
caller omitted it. Harmless no-op on Linux.

Fixes these serial 01_PW cases on the native Windows build (verified):
- 056_PW_IW          (init_wfc=file: read wfc from binary file)
- 057_PW_SO_IW       (SOC + init_wfc=file)
- 075_PW_CHG_BINARY  (binary charge I/O)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Structure_Factor::bspline_sf (nbspline>0, B-spline structure factor)
scatters each real-space plane into tmpr via Parallel_Grid::zpiece_to_all,
which is guarded by #ifdef __MPI. In a serial build tmpr is never filled
(it is new double[nrxx], uninitialized), so real2recip(tmpr, strucFac)
produces a garbage structure factor -> grossly wrong total energy, force
and stress. CI never hits this path (integration tests run under MPI).

Add the serial branch: fill tmpr directly using the SAME real-space layout
as zpiece_to_all's serial path, rho[ir*nczp + znow] (xy outer, z innermost;
nczp==nz, znow==iz when serial).

Verified on tests/01_PW/032_PW_15_CF_CS_bspline (native Windows serial):
energy and stress now match the reference to ~1e-8 (was ~1480 eV / 30000
kbar off); residual force ~5e-3 is B-spline interpolation + cross-platform
float noise.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s not a bug)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 2, 2026 17:02
@ErjieWu ErjieWu marked this pull request as draft June 2, 2026 17:02
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an experimental native Windows (MSYS2/MinGW-w64) build path for ABACUS (Phase 1: serial plane-wave only) while keeping Linux behavior unchanged, and extends the existing test harness to support running without an MPI launcher (-n 0).

Changes:

  • Introduces a Windows toolchain variant (pacman-installed OpenBLAS/FFTW + Ninja/CMake) and build script to produce a real Windows executable.
  • Improves portability across Windows/case-insensitive filesystems (CMake FindBLAS/LAPACK recursion fix, binary file I/O mode, mkdir compatibility, Windows CRT permission bits).
  • Fixes serial-only correctness issues surfaced by running the PW test suite without MPI (seeded random wavefunction init + B-spline structure factor grid fill), and adds serial mode support to Autotest.sh.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
toolchain/toolchain_windows.sh Installs MSYS2/MinGW-w64 dependencies via pacman and writes an env setup file.
toolchain/build_abacus_windows.sh Configures/builds a serial PW Windows executable and generates abacus_env.sh.
tests/integrate/Autotest.sh Adds -n 0 serial mode (no mpirun) and adjusts OpenMP thread defaulting.
source/source_pw/module_pwdft/structure_factor.cpp Initializes tmpr in serial builds to avoid garbage structure factors.
source/source_psi/psi_initializer.cpp Fixes seeded random wavefunction initialization in serial (non-MPI) builds.
source/source_io/module_restart/restart.cpp Adds Windows CRT compatibility for open() mode bits and includes <io.h>.
source/source_io/module_parameter/input_conv.h Replaces POSIX <regex.h> parsing with portable std::regex.
source/source_io/module_output/binstream.cpp Forces binary fopen mode by ensuring 'b' is present.
source/source_base/module_fft/fft_cpu.h Removes __attribute__((weak)) from FFT_CPU virtuals (Windows/PE safety).
source/source_base/module_fft/fft_base.h Provides non-null default virtual bodies to avoid PE null vtable slots.
source/source_base/module_container/base/core/cpu_allocator.cpp Uses _aligned_malloc/_aligned_free on Windows for aligned allocations.
source/source_base/global_function.cpp Switches directory creation to a new portable helper.
source/source_base/global_file.cpp Switches directory creation to a new portable helper.
source/source_base/fs_compat.h Adds portable ModuleBase::make_directory() wrapper.
docs/advanced/install_windows_native.md Documents the experimental native Windows build and serial PW testing flow.
CMakeLists.txt Adds Windows portability defines; gates ScaLAPACK on ENABLE_MPI; skips -lm and symlink install for Windows/MSVC.
cmake/FindLapack.cmake Avoids infinite recursion on case-insensitive filesystems by temporarily adjusting CMAKE_MODULE_PATH.
cmake/FindBlas.cmake Avoids infinite recursion on case-insensitive filesystems by temporarily adjusting CMAKE_MODULE_PATH.
.gitattributes Forces LF endings for bash scripts and CASES_*.txt to avoid CRLF issues on Windows.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread source/source_base/module_container/base/core/cpu_allocator.cpp
Comment thread source/source_io/module_parameter/input_conv.h
Comment thread source/source_io/module_parameter/input_conv.h Outdated
ErjieWu and others added 6 commits June 3, 2026 01:36
…s off

before_scf() unconditionally dereferenced *(two_center_bundle_.overlap_orb_alpha)
to pass it to deepks.build_overlap(). overlap_orb_alpha is only built when DeePKS
is enabled (descriptor orbitals); with DeePKS off it is a null unique_ptr, so
forming the reference is undefined behaviour (caught as an abort in a debug
libstdc++ build; benign in release as the DeePKS stub ignores it). Guard the call
on the integrator being present.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extend the native-Windows toolchain to the full supported configuration,
mirroring build_abacus_gnu.sh:

- toolchain_windows.sh: also pacman-install cereal (LCAO), msmpi (MPI), and
  scalapack (distributed LCAO eigensolver). Documents that the MS-MPI runtime
  is a separate system-wide Microsoft redistributable.
- build_abacus_windows.sh: build MPI + LCAO by default (abacus_basic_para.exe);
  ENABLE_MPI / ENABLE_LCAO env toggles select serial / PW-only. Point FindMPI at
  the MinGW MS-MPI import lib; ScaLAPACK is found automatically when ENABLE_MPI.
  abacus_env.sh now also exports OPENBLAS_NUM_THREADS=1 (required so OpenBLAS's
  multithread buffer allocator does not fail under multiple MPI ranks).
- docs/advanced/install_windows_native.md: document the LCAO+MPI build, parallel
  testing (mpiexec / mpirun shim), and the known serial gamma-only LCAO bug
  (use the MPI build, which is correct to ~1e-11 even on a single rank).

Validated against 01_PW / 02_NAO_Gamma / 03_NAO_multik via the standard harness:
under MPI all three pass within the cross-platform error range; residual
differences are float noise at strict absolute thresholds, gauge-dependent
outputs, or excluded features (SCAN/meta-GGA needs LibXC, DFT+U needs MPI).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Running tests/integrate/Autotest.sh directly failed with "no mpirun found":
MS-MPI ships only mpiexec, and the harness invokes `mpirun -np N`. Three
Windows-specific gaps, all fixed in build_abacus_windows.sh so the standard
harness works unchanged:

* mpirun shim. The build now drops an `mpirun`->`mpiexec` shim next to the
  binary (on PATH via abacus_env.sh). MS-MPI's `-n`/`-np <N> <prog>` syntax
  matches what the harness passes, so forwarding args is enough.

* OpenBLAS thread pinning. MSYS2's OpenBLAS is OpenMP-threaded (links libgomp),
  so OMP_NUM_THREADS -- not OPENBLAS_NUM_THREADS -- caps its threads. Autotest
  sets OMP_NUM_THREADS=nproc/np, so each rank spawned a multithreaded BLAS, the
  ranks oversubscribed the cores, and OpenBLAS's buffer allocator died
  ("Memory allocation still failed after 10 retries"). The shim and abacus_env.sh
  now pin OMP_NUM_THREADS=1 (ABACUS is built USE_OPENMP=OFF, so parallelism is
  MPI; the BLAS pin costs nothing).

* DLL bundling. mpiexec does not propagate PATH to child ranks when stdout is
  redirected to a file (as the harness does), so the child abacus.exe failed to
  load libopenblas/libfftw3/libscalapack ("error while loading shared
  libraries"). The build now copies the dependent MinGW/OpenBLAS/FFTW/ScaLAPACK
  DLLs next to abacus.exe; Windows searches the application directory before
  PATH, making the binary self-contained.

Verified end to end with the default invocation `bash Autotest.sh -a abacus`
(np=4, via the shim): 01_PW/001, 02_NAO_Gamma/scf_afm (gamma-only LCAO), and
03_NAO_multik/scf_pp_upf201 all pass. Corrects the earlier docs/notes that
cited OPENBLAS_NUM_THREADS and a hand-made shim.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The mpirun shim died with `exec: mpiexec: not found`: MSYS2's MinGW shell does
not inherit the Windows PATH, and MS-MPI's mpiexec.exe lives in its own Bin dir
(only msmpi.dll is in System32). The MSMPI_BIN env var (set by the MS-MPI
installer) *is* inherited, so abacus_env.sh now prepends `cygpath -u "$MSMPI_BIN"`
to PATH, making both `mpiexec` and the shim resolve. Verified from a minimal
PATH: which mpiexec/mpirun both resolve and 01_PW/001 passes via the default
harness invocation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two issues from code review of the Windows-port commits:

1. FFT_CPU<float> undefined references on Linux (regression). The port removed
   __attribute__((weak)) from the FFT virtuals (it left null vtable slots on
   PE/MinGW and crashed). But the real FFT_CPU<float> methods live in
   fft_cpu_float.cpp, which is compiled only when ENABLE_FLOAT_FFTW=ON. With
   weak gone and float off (the Linux default), the FFT_CPU<float> vtable --
   still emitted wherever the class is constructed (FFT_Bundle) -- referenced
   undefined symbols:
     undefined reference to `ModuleBase::FFT_CPU<float>::setupFFT()' ...
   Provide trivial FFT_CPU<float> method definitions in the always-compiled
   fft_cpu.cpp, guarded by `#if !defined(__ENABLE_FLOAT_FFTW)`, so every vtable
   slot is valid on any ABI without weak and without pulling in libfftw3f. The
   float CPU path stays unreachable at runtime (FFT_Bundle::setupFFT
   WARNING_QUITs for single/mixing CPU FFT unless the macro is set). When the
   macro is on, the stubs are excluded and fft_cpu_float.cpp supplies the real
   definitions -- no duplicate symbols. Verified by linking the float vtable TU
   against fft_cpu.o in both macro states (off: links via stubs; on: links via
   fft_cpu_float.o), and that dropping both reproduces the reported errors.

2. parse_expression (input_conv.h) could push indeterminate values into vec.
   If std::regex_search found no match, sub_str stayed empty and was parsed
   anyway; in the non-multiplication branch `T occ` was uninitialized and the
   `convert >> occ` extraction was unchecked. Now: a no-match token is an input
   error (WARNING_QUIT), occ is value-initialized, and a failed extraction
   fails fast. Consistent with the other expression parsers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rework the FFT_CPU<float> vtable handling so Linux builds byte-for-byte as
upstream and only Windows gets a delta. My earlier port had (a) removed
__attribute__((weak)) outright and (b) added trivial float stubs in
fft_cpu.cpp -- both changed working Linux core code, and (b) didn't even reach
targets that compile fft_bundle.cpp without linking fft_cpu.cpp (e.g.
MODULE_HAMILT_XCTest_VXC), so Linux still failed to link:
    undefined reference to `ModuleBase::FFT_CPU<float>::setupFFT()' ...

Root cause: the upstream virtuals are __attribute__((weak)) so the ELF linker
nulls the unused FFT_CPU<float> vtable slots when ENABLE_FLOAT_FFTW is off.
MinGW/PE has no equivalent -- weak template members there collide
("multiple definition") or leave null slots that crash on dispatch (verified
both empirically with g++).

Fix, keeping Linux untouched:
* Introduce ABACUS_FFT_WEAK = __attribute__((weak)) on non-Windows, empty on
  _WIN32, and use it in place of the raw attribute in fft_base.h / fft_cpu.h.
  Preprocessing with -U_WIN32 reproduces the upstream headers exactly (14 weak
  attrs, no extra defs); fft_cpu.cpp is reverted to pristine.
* On Windows the empty macro makes the slots ordinary symbols; the build
  already sets ENABLE_FLOAT_FFTW=ON, so fft_cpu_float.cpp supplies the real
  FFT_CPU<float> methods. The non-pure FFT_BASE<T> virtuals (which had no body,
  relying on weak) get trivial bodies in a `#if defined(_WIN32)` block -- never
  executed (abstract base; backends override what they use). This block is
  compiled only on Windows.

Verified with MinGW g++: constructing FFT_CPU<float> and dispatching through
its vtable links (no multiple-definition, no undefined base/derived refs) and
runs (no null-vtable crash); and the Linux-simulated preprocess output matches
upstream.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mohanchen mohanchen added Features Needed The features are indeed needed, and developers should have sophisticated knowledge Refactor Refactor ABACUS codes labels Jun 3, 2026
ErjieWu and others added 3 commits June 4, 2026 12:30
The Windows build defaulted to -j nproc. On a 20-core box, 20 concurrent -O3
compilations of heavy template TUs (source_cell/module_symmetry/symmetry.cpp,
read_pp_upf201.cpp, ...) exhausted memory and ninja died with
"cc1plus.exe: out of memory allocating N bytes" -- even with 31 GB RAM.

Default -j is now min(nproc, MemTotalGB / 3) (~3 GB budget per job), read from
/proc/meminfo; an explicit -j still overrides, and the chosen value is printed
with a hint to lower it if cc1plus runs out of memory. Falls back to nproc if
/proc/meminfo is unreadable. Not a code issue -- the sources compiled fine up
to the OOM.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was a working note for the native-Windows build trial, not reference
documentation for the repository. Drop it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ErjieWu ErjieWu changed the title Native Windows (MSYS2/MinGW-w64) serial plane-wave build Native Windows (MSYS2/MinGW-w64) build for ABACUS Jun 4, 2026
@ErjieWu ErjieWu marked this pull request as ready for review June 4, 2026 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Features Needed The features are indeed needed, and developers should have sophisticated knowledge Refactor Refactor ABACUS codes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test case 01_PW/032_PW_15_CF_CS_bspline failed for serial version ABACUS.

3 participants