Feature: DeltaSpin for LCAO and PW base and DFTU for PW, both collinear and noncollinear spin (code change and UTs) by dyzheng · Pull Request #7384 · deepmodeling/abacus-develop

dyzheng · 2026-05-26T09:42:20Z

Reminder

Have you linked an issue with this pull request?
Have you added adequate unit tests and/or case tests for your pull request?
Have you noticed possible changes of behavior below or in the linked issue?
Have you explained the changes of codes in core modules of ESolver, HSolver, ElecState, Hamilt, Operator or Psi? (ignore if not applicable)

Linked Issue

Fix #...

Unit Tests and/or Case Tests for my changes

A unit test is added for each new feature or bug fix.

What's changed?

Example: My changes might affect the performance of the application under certain conditions, and I have tested the impact on various scenarios...

Any changes of core modules? (ignore if not applicable)

Example: I have added a new virtual function in the esolver base class in order to ...

…ing develop) Direct sync of source/ directory from feat/dftu-pw-port-v2 which has been merged with origin/develop. 101 files changed, +10659/-1262 lines.

Reverted infrastructure files to origin/develop state to clean up the PR. Files reverted: - source/source_cell/read_atoms_helper.cpp - source/source_base/parallel_global.cpp - source/source_base/timer.cpp - source/source_base/kernels/cuda/math_kernel_op.cu - source/source_base/module_device/device_check.h - source/source_base/module_container/base/macros/cuda.h - source/source_base/module_container/base/third_party/cusolver.h - source/source_esolver/lcao_others.cpp - source/source_hsolver/kernels/cuda/diag_cusolvermp.cu - source/source_io/module_unk/berryphase.cpp - source/source_lcao/module_rt/solve_propagation.cpp - source/source_main/main.cpp - source/source_lcao/module_ri/RPA_LRI.hpp Kept: - DFT+U/DeltaSpin core functionality (esolver, kernels with npol, dftu modules) - Input parameters - Build/Test system updates

Restored the following files from port-v2 as they contain essential functional changes and bug fixes: - source/source_esolver/lcao_others.cpp: Passes sc_direction_only parameter to init_deltaspin_lcao. - source/source_base/parallel_global.cpp: Fixes MPI_Comm_free logic to avoid freeing MPI_COMM_WORLD. - source/source_main/main.cpp: Reorders fftw_cleanup_threads before MPI_Finalize to avoid segfault. - source/source_base/kernels/cuda/math_kernel_op.cu: Adds cublas_handle null check. Other infrastructure files (timer.cpp, device_check.h, berryphase.cpp, etc.) remain reverted to develop.

Restoring all code-related changes as previous partial revert caused compilation errors in CI.

Reverted 15 files with modifications that are completely unrelated to the DFT+U and DeltaSpin functionality: - Pure comment/formatting changes: timer.cpp, diago_david.cpp, esolver_sdft_pw.cpp, hsolver_lcao.cpp, berryphase.cpp, RPA_LRI.hpp, solve_propagation.cpp, diag_cusolvermp.cu, read_sep_test.cpp, cusolver.h (3 comment-only lines restored, CUDA version guards kept) - Unrelated code refactor: elecstate_pw.cpp (vkb.nc -> vkbnc rename) - Unused variable: forces.cpp (forcepaw declaration) - Unrelated test: nscf_utils_test.cpp (deleted) and its CMakeLists entry - operator_lcao.cpp: restored original comments and removed extra blank lines

dzzz2001

Detailed inline review covering blockers and should-fix items, organized by sub-area (DeltaSpin core, DFTU core, PW kernels + onsite, infra/wiring, tests). Each inline comment is anchored to the most representative diff line. Happy to discuss any of these.

… feat/port-dftu-ds-code-v4

The second __global__ void onsite_op(...) variant (DFT+U kernel taking vu_iat) was missing the npol==1 branch, while the CPU version had it. For npol==1 this caused OOB reads on becp, writes to ps[psind+1], and indexing off-diagonal vu entries that don't exist. Also hoisted iat/vu_iat/orb_l/tlp1 lookups outside the loop (like the first DeltaSpin variant) for consistency. Addresses reviewer comment: deepmodeling#7384 (comment)

Mirror of the CUDA fix for the ROCm/HIP backend. The second __global__ void onsite_op(...) variant (DFT+U kernel taking vu_iat) was missing the npol==1 branch. For npol==1 this caused OOB reads on becp, writes to ps[psind+1], and indexing off-diagonal vu entries that don't exist. Addresses reviewer comment: deepmodeling#7384 (comment)

The npol==1 branch in cal_force_nl_op (DeltaSpin force) and cal_stress_nl_op (DeltaSpin stress) was missing the spin_sign that op_pw_proj.cpp applies when building tmp_lambda_coeff. For nspin==2, the force/stress from spin-down k-points (isk==1) was added with the wrong sign, so force != -dE/dR and stress tensor inconsistent with dE/dε. Addresses reviewer comments: deepmodeling#7384 (comment) deepmodeling#7384 (comment)

In direction_only mode, the code was permanently mutating this->target_mag_[ia] += parallel * dir on every inner step without ever restoring it. After the first inner step, target_mag_ no longer reflected user input, and the corruption persisted across SCF iterations. Fixed by using a local copy (target_mag_adj) for the adjusted target direction, leaving the original target_mag_ unchanged. Addresses reviewer comment: deepmodeling#7384 (comment)

The early continue for atoms without correlated orbitals ran before locale[iat].resize(), iatlnmipol2iwt[iat].resize(), and eff_pot_pw_index[iat] assignment, leaving these per-atom structures default-constructed (size 0). While current readers are guarded by orbital_corr[it] == -1, any future code accessing these for an arbitrary atom would segfault. Fixed by moving the per-atom resize/assignment before the guard, so only the Hubbard-specific work (pot_index accumulation and locale.create() block) is skipped for non-correlated atoms. Addresses reviewer comment: deepmodeling#7384 (comment)

The [DS-DIAG] cout fired on every STRU read, on every MPI rank, cluttering every production log. Addresses reviewer comment: deepmodeling#7384 (comment)

Two problems fixed: 1. Timer was empty (start/end called twice with no work between). Moved end() to after the actual work. 2. No MPI synchronization of mixed uom_in. After a few Pulay steps, occupation matrices on different ranks would silently diverge. Added Parallel_Common::bcast_double after mixing (same pattern as mix_dmr). Addresses reviewer comment: deepmodeling#7384 (comment)

These files were marked "INCOMPLETE" and "will not compile as-is". They reference strategy_type_, strategy_, LambdaStrategyType, set_strategy_type, and set_strategy_params — none declared in spin_constrain.h. lambda_update_strategies.{h,cpp} (~700 lines) were labelled "NOT compiled into the library." If compiled, the build breaks. If not, they ship as dead source that confuses every future reader. Dropped from this PR; can be re-introduced in a follow-up when finished, registered with CMake, and unit-tested. Addresses reviewer comment: deepmodeling#7384 (comment)

The CUDA < 11.0 fallback called cusolverDn{S,D,C,Z}trtri_bufferSize and cusolverDn{S,D,C,Z}trtri — these symbols do not exist in any cuSOLVER release. Only cusolverDnXtrtri (CUDA 11.0+, already used by the >= 11000 branch) and cublas{S,D,C,Z}trtri_batched exist. Dropped the fallback entirely and require CUDA >= 11.0, consistent with module_container/base/macros/cuda.h:25-31 which already guards GetTypeCuda<int64_t> on CUDA_VERSION >= 11000. Addresses reviewer comment: deepmodeling#7384 (comment)

D_I was filled row-major as D_I[lm * nbands_global + jb_global] but passed to column-major BLAS zgemm with lda=r. The column-major view of D_I with leading dimension r corresponds to D_col[lm + jb*r], which is the transpose of the data actually stored. Result: computed D · D^H instead of D^H · D. Fixed to fill D_I column-major: D_I[lm + jb_global * r]. Addresses reviewer comment: deepmodeling#7384 (comment)

DeltaSpin<...>::set_current_spin(int) was shadowing (not overriding) OperatorLCAO::set_current_spin, which was not declared virtual. The dispatch chain in OperatorLCAO::set_current_spin walks next_op via dynamic_cast<OperatorLCAO<TK,TR>*> and calls the base method, so the derived reset this->sc_hr_done = false never fired when DeltaSpin was wired into the operator stack via next_op. Made the base method virtual so the override in DeltaSpin is properly dispatched, ensuring sc_hr_done is reset on spin switch. Addresses reviewer comment: deepmodeling#7384 (comment)

…o.cpp The old code used get_locale(iat0, target_L, 0, 0, m0_all, m1_all) which treated locale[iat][L][0][0] as a (2*tlp1)x(2*tlp1) matrix indexed as a 2x2 block of Pauli channels. But every writer of nspin=4 locale (cal_occ_pw in dftu_pw.cpp and the LCAO writers) stores the four Pauli channels as four stacked tlp1^2 blocks at offsets 0, tlp1^2, 2*tlp1^2, 3*tlp1^2 inside c[]. The old code read from the wrong stacked block, causing silent corruption on restarts with init_chg=file + nspin=4. Fixed to use get_locale_flat which reads the stacked blocks directly in the same format they were written. Addresses reviewer comment: deepmodeling#7384 (comment)

Functionally harmless (idempotent) but a clear copy-paste accident that shouldn't pass review. Addresses reviewer comment: deepmodeling#7384 (comment)

…files Pulling parameter.h into kernel-op headers means every TU that includes these headers transitively includes the whole parameter parser — meaningful compile-time bloat. The parameter values needed in the kernel implementation can be passed as function arguments (as npol already is). Addresses reviewer comments: deepmodeling#7384 (comment) deepmodeling#7384 (comment)

These includes are only used in the .cpp implementation file. Moving them out of the header avoids compile-time bloat. Addresses reviewer comment: deepmodeling#7384 (comment)

… test CMakeLists The lambda_update_strategies.{h,cpp} and lambda_update_strategies_test.cpp files were removed (incomplete implementation). Remove the corresponding AddTest entry and the TODO comment about lambda_loop_helper_test (which also depended on the deleted files). Addresses reviewer comment: deepmodeling#7384 (comment)

…dy, invalidate_becp) The becp cache was set/invalidated but never read anywhere. force_onsite / stress_onsite still recompute becp unconditionally. Remove the dead state entirely to avoid confusion. If caching is desired in the future, the consumer needs to be wired to check is_becp_ready() and skip recomputation. Addresses reviewer comment: deepmodeling#7384 (comment)

cal_VU_pot_pw was a (void)spin; stub that did nothing. The public declaration in dftu.h advertised 'calculate the local DFT+U effective potential matrix for PW base' but the function was never called anywhere. VU potential for PW is computed via cal_eff_pot_mat in the onsite projector path. Remove the dead declaration and stub to avoid confusion. Addresses reviewer comment: deepmodeling#7384 (comment)

The GPU kernels for DFT+U and DeltaSpin force/stress computation hardcoded npol=2 (noncollinear) indexing: ib2 = ib*2, 4-component Pauli matrix coefficients, and spinor becp/dbecp access patterns. This caused incorrect results or crashes for collinear (npol=1) runs. Add npol as a template parameter to cal_force_onsite and cal_stress_onsite kernels, with runtime dispatch via if (npol == 1) vs if (npol == 2) branching inside the kernel body. For npol==1 DeltaSpin, compute spin_sign from isk[ik] on the host side and pass it as a kernel argument, avoiding the need for a device isk array. Applies to both CUDA (.cu) and ROCm (.hip.cu) implementations. Addresses reviewer comments (Blockers 2 and 3): deepmodeling#7384

The nspin validation check was commented out, allowing DeltaSpin to be used with nspin=1 which is physically meaningless. Restore the check that requires nspin==2 or nspin==4 when sc_mag_switch is true. Addresses reviewer comment: deepmodeling#7384

For nspin==2 (collinear), the magnetic moment constraint direction is z, but the autoset code was setting m_loc_[ia].x instead of m_loc_[ia].z. This caused target_mag_[iat].z to be zero in set_target_mag(), making DeltaSpin constraints ineffective. Addresses reviewer comment: deepmodeling#7384

For nspin==1, the PW occupation matrix already includes spin degeneracy (factor of 2) from the k-point weights. The Hubbard energy formula E_U = U * weight_eu * occ^2 yields U * 4 * occ_half^2 with weight_eu=1.0, but the correct result should be U * n^2 / 2 (matching LCAO). Use weight_eu=0.25 for nspin==1 to get U * occ_full^2 * 0.25 = U * (2*occ_half)^2 * 0.25 = U * occ_half^2, matching the LCAO half-occupation convention before set_double_energy(). Addresses reviewer comment: deepmodeling#7384

…rrors The DeltaSpin npol==1 spin_sign calculation referenced this->isk/this->ik on structs that don't have these members. Fix by: - Adding const int* isk parameter to DeltaSpin force/stress kernel signatures in force_op.h, stress_op.h, and all CPU/GPU implementations - Passing kv_->isk.data() from Onsite_Proj_tools cal_force_dspin and cal_stress_dspin callers - Adding missing parallel_common.h include to charge_mixing.cpp - Fixing get_locale_flat call in dftu_lcao.cpp (was called per-element inside loop but function fills entire array at once) Build verified passing for CPU-only build. Addresses reviewer comments: deepmodeling#7384

The old non-templated DeltaSpin kernel body was not fully replaced during the npol template refactoring, leaving ~27 lines of orphaned code outside any function, causing 57 CUDA compilation errors. Build verified passing with CUDA enabled.

…errors The virtual set_current_spin method had its definition only in operator_lcao.cpp which was compiled into hamilt_lcao but not linked by test executables, causing undefined reference errors. Provide a simple inline implementation in the header that sets current_spin. Remove the duplicate out-of-line definition from operator_lcao.cpp. The next_op chain propagation was redundant since init() already handles next_op traversal. Also fixed orphaned DeltaSpin kernel code in CUDA stress_op.cu. Build verified passing with CUDA and BUILD_TESTING=ON.

The autoset_magnetization function was changed to set m_loc_[ia].z instead of m_loc_[ia].x for nspin==2, because set_target_mag reads target_mag_[iat].z for collinear spin constraints. Update the test to match the corrected behavior. Addresses CI test failure in MODULE_CELL_unitcell_test.

dyzheng and others added 7 commits May 26, 2026 16:02

feat: sync DFT+U/DeltaSpin code from feat/dftu-pw-port-v2 (after merg…

c582c1a

…ing develop) Direct sync of source/ directory from feat/dftu-pw-port-v2 which has been merged with origin/develop. 101 files changed, +10659/-1262 lines.

feat: restore full source/ content from feat/dftu-pw-port-v2

f0a892a

Restoring all code-related changes as previous partial revert caused compilation errors in CI.

test: enable 17_DS_DFTU test suite in CMakeLists

ac27932

Merge branch 'develop' into feat/port-dftu-ds-code-v4

4ef2f30

mohanchen requested a review from dzzz2001 May 28, 2026 08:41

dzzz2001 reviewed May 28, 2026

View reviewed changes

Merge branch 'develop' of github.com:deepmodeling/abacus-develop into…

c82536b

… feat/port-dftu-ds-code-v4

mohanchen added Features Needed The features are indeed needed, and developers should have sophisticated knowledge Refactor Refactor ABACUS codes DFT+U Issues related to DFT plus U function collinear/non-collinear/SOC Issues related to SOC labels Jun 1, 2026

dyzheng added 16 commits June 3, 2026 20:14

fix: remove stray diagnostic std::cout from STRU parse path

e60c174

The [DS-DIAG] cout fired on every STRU read, on every MPI rank, cluttering every production log. Addresses reviewer comment: deepmodeling#7384 (comment)

fix: collapse 37 duplicate remove_definitions(-D__CUDA) to one line

bc79dba

Functionally harmless (idempotent) but a clear copy-paste accident that shouldn't pass review. Addresses reviewer comment: deepmodeling#7384 (comment)

fix: move parallel_orbitals.h and klist.h from elecstate_lcao.h to cpp

cc84b99

These includes are only used in the .cpp implementation file. Moving them out of the header avoids compile-time bloat. Addresses reviewer comment: deepmodeling#7384 (comment)

dyzheng added 7 commits June 3, 2026 20:32

dyzheng force-pushed the feat/port-dftu-ds-code-v4 branch from ab3736b to 5539972 Compare June 3, 2026 14:07

dyzheng requested a review from dzzz2001 June 3, 2026 14:44

dyzheng added 2 commits June 3, 2026 23:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: DeltaSpin for LCAO and PW base and DFTU for PW, both collinear and noncollinear spin (code change and UTs)#7384

Feature: DeltaSpin for LCAO and PW base and DFTU for PW, both collinear and noncollinear spin (code change and UTs)#7384
dyzheng wants to merge 34 commits into
deepmodeling:developfrom
dyzheng:feat/port-dftu-ds-code-v4

dyzheng commented May 26, 2026

Uh oh!

dzzz2001 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dyzheng commented May 26, 2026

Reminder

Linked Issue

Unit Tests and/or Case Tests for my changes

What's changed?

Any changes of core modules? (ignore if not applicable)

Uh oh!

dzzz2001 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants