Align tmp contracts for tile ops by FangRui0 · Pull Request #824 · hw-native-sys/PTOAS

FangRui0 · 2026-06-16T11:02:16Z

No description provided.

gemini-code-assist

Code Review

This pull request updates several PTO IR operations to relax constraints on the tmp (temporary workspace) operand, particularly on the A5 architecture where it is treated as a placeholder that does not need to match the shape or layout of the source/destination tiles. This prevents false vector overflow diagnostics and updates memory effects. The review feedback points out a mismatch in the expected error message in a new test file and recommends completely removing the unused verifyTRowReductionNoTmpCommon helper function in PTO.cpp to avoid dead code.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-16T11:04:15Z

+  }
+}
+
+// CHECK: error: 'pto.trowmax' op expects tmp to use the row_major blayout


The expected error message in the CHECK directive does not match the actual error message emitted by the verifier in lib/PTO/IR/PTO.cpp. The verifier emits "expects tmp to use row-major layout", but the test expects "expects tmp to use the row_major blayout". This mismatch will cause the lit test to fail.

// CHECK: error: 'pto.trowmax' op expects tmp to use row-major layout

gemini-code-assist · 2026-06-16T11:04:15Z

+[[maybe_unused]] static LogicalResult
+verifyTRowReductionNoTmpCommon(Operation *op, Type srcTy, Type dstTy,
+                               StringRef elemTypeError) {


The helper function verifyTRowReductionNoTmpCommon is now marked [[maybe_unused]] because all of its callers (TRowMaxOp and TRowSumOp) have been migrated to use verifyTRowReductionWithTmpCommon. Since this is a static helper function in this translation unit and is no longer used anywhere, it should be completely removed from the codebase to avoid dead code.

reedhecre · 2026-06-16T11:05:32Z

Codex Review

该评论由 review 机器人自动更新。

PR: Align tmp contracts for tile ops #824 Align tmp contracts for tile ops
Author: FangRui0
Base/Head: main / refactor_tmp
Head SHA: 0f93917e1fa7
Trigger: PR 有新提交
Generated At: 2026-06-22T08:47:41Z
Previous Head SHA: adeae8fba5a4
Status: completed

Summary

PR #824 introduces a backend contract mismatch for A5 trowexpand* tmp forms, and one PTODSL reference example no longer matches the kernel it claims to reconstruct.

Findings

P2 A5 `trowexpand*` tmp forms are now verifier-legal, but the VPTO backend still only provides 3-operand kernels lib/PTO/IR/PTO.cpp:10193

verifyTRowExpandReduceLikeOp now accepts A5 ops with a %tmp operand, so forms like pto.trowexpandmax/add/div/mul/sub/expdif/min pass verification. However the registered A5 TileLang kernels for this family still take only (src0, src1, dst) and have no 4-operand variant (for example lib/TileOps/trowexpandmax_template.py and lib/TileOps/trowexpandadd_template.py). That means the new IR is only accepted by --emit-pto-ir; compiling the same op with --pto-backend=vpto has no matching kernel to lower to and should fail.

P3 `softmax_lowlevel.py` no longer reconstructs the referenced softmax kernel ptodsl/examples/softmax_lowlevel.py:61

This example still says it reconstructs test/tilelang_st/npu/a5/src/st/testcase/softmax/softmax.pto, but tile_col was changed to blayout=row_major. The referenced kernel allocates the corresponding oldmax/oldsum/newmax/newsum/expmax tiles as blayout=col_major (softmax.pto lines 116-133). The example therefore emits different tile types than the file it claims to mirror, so it is no longer a faithful low-level reference.

Signed-off-by: FangRui <fangrui_95@163.com>

FangRui0 · 2026-06-22T08:15:16Z

/run a3

reedhecre · 2026-06-22T08:20:03Z

已接收 /run a3，A3 板测器会处理这条请求。

进度页：http://154.9.227.233/ptoas-board-dashboard/#board-a3
当前状态：板测器空闲，这条请求会在本轮轮询启动。

页面会自动刷新，可以直接看当前阶段、排队情况和最近结果。

reedhecre · 2026-06-22T08:29:58Z

A3 板测失败

触发方式：manual
源码提交：cfe94f5ba586
结果汇总：OK 0 / FAIL 0 / SKIP 0
日志：/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260622_162013_manual_pr824.log
手动指令：/run a3
触发人：FangRui0
触发评论：Align tmp contracts for tile ops #824 (comment)
失败阶段：sample-build-and-test / exit=1

日志尾部

al.py) SKIP requires --pto-arch=a5
Sync(test_intercore_sync_a5_ptoisa_vec.py) SKIP requires --pto-arch=a5
Sync(test_intercore_sync_a5.py) SKIP requires --pto-arch=a5
Sync(test_mem_inject_sync_basic.py) OK   generated: test_mem_inject_sync_basic-pto.cpp
Sync(test_set_wait_unified_api.py) OK   generated: test_set_wait_unified_api-pto.cpp
Sync(test_tmov_col_major_16x1_align_a5.pto) SKIP requires --pto-arch=a5
Sync(test_tmov_col_major_16x1_align_a5.py) SKIP requires --pto-arch=a5
Sync(test_tmov_row_major_1x16_control_a5.pto) SKIP requires --pto-arch=a5
Sync(test_tmov_row_major_1x16_control_a5.py) SKIP requires --pto-arch=a5
Sync(tmatmulk_autosync.py) OK   generated: tmatmulk_autosync-pto.cpp
TileSetGetValue(tile_getval_mat_invalid.py) XFAIL ptobc encode failed as expected
TileSetGetValue(tileSetGetValue.py) OK   generated: tileSetGetValue-pto.cpp
TInsert(tinsert_fp.py) OK   generated: tinsert_fp-pto.cpp
TInsert(tinsert.py) OK   generated: tinsert-pto.cpp
Tpows(tpows.py) OK   generated: tpows-pto.cpp
Tpow(tpow.py) OK   generated: tpow-pto.cpp
TPrefetchAsync(tprefetch_async_binding.py) OK   generated: tprefetch_async_binding-pto.cpp
TPrefetch(tprefetch.py) OK   generated: tprefetch-pto.cpp
Trans(trans.py) OK   generated: trans-pto.cpp
Trap(trap.py) OK   generated: trap-pto.cpp
TTri(ttri.py) OK   generated: ttri-pto.cpp
VectorAddition(vadd_pto_ir.py) OK   generated: vadd_pto_ir-pto.cpp
VectorAddition(vadd_validshape_hyper.py) OK   generated: vadd_validshape_hyper-pto.cpp
VectorAddition(vectorAddition.py) OK   generated: vectorAddition-pto.cpp
Xors(xors.py) OK   generated: xors-pto.cpp
Xor(xor.py)  OK   generated: xor-pto.cpp
-----------------------------
OK=235  FAIL=2  SKIP=39
=============================
===== END STAGE sample-build-and-test rc=1 @ 2026-06-22 16:30:03 =====

FangRui0 · 2026-06-22T09:32:49Z

/run a3

zhangstevenunity

Review: Align tmp contracts for tile ops

The A5 "tmp is an unused placeholder" modeling (skip the scratch WRITE in getEffects to dodge false vec-overflow) is sound for the ops whose A5 kernel genuinely ignores tmp: tsel/tsels/txor/txors TileLang templates never write tmp, and the new A5 txors verifier checks are correctly A2/A3-gated. But two changes look like correctness regressions, and one needs confirmation.

Requesting changes

1 (blocking). A5 trowexpand* + tmp is now verifier-legal but cannot be lowered on EITHER A5 backend.
This PR deletes the hasTmp && A5 -> "expects A5 form to omit tmp" guard in verifyTRowExpandReduceLikeOp. That guard's own comment said it exists so a 0x0 dst cannot "let an A5 tmp form slip through and lower to the A2/A3 4-operand TROWEXPAND* call." Removing it re-enables exactly that:

--pto-backend=vpto: the A5 TileLang templates for this family are 3-param (src0, src1, dst) (lib/TileOps/trowexpandmax_template.py). ExpandTileOp builds one operand-spec per operand (buildSpecKey over all operands) and expand_helper rejects any descriptor where len(parameters) != len(operand_specs) (expand_helper.py:335), so a 4-operand op finds no kernel and ExpandTileOp hard-fails: "failed to instantiate tilelang template for pto.trowexpandmax" (ExpandTileOp.cpp:1204). This is the Codex bot's P2.
default EmitC: PTORowExpand*ToEmitC forwards tmp as a 4th arg unconditionally, emitting TROWEXPANDMAX(dst, src0, src1, tmp). The A5 board kernels are 3-operand (test/.../a5/.../trowexpandmax/main.cpp declares LaunchTROWEXPANDMAX_*(src0, src1, dst, stream); the A5 ST .pto uses ins(%src0, %src1)), matching the pre-PR "A5 supports the 3-operand form only" note this PR removed. So the 4-arg call has no A5 ISA overload.

The new tests do not catch this: issue708_* and tmp_contract_a5_non_same_shape use --emit-pto-ir (stops before lowering), and a5_unused_tmp_vec_overflow only FileChecks emitted text, never compiling for the board. Please either keep rejecting tmp on A5 for trowexpand*, or give the vpto templates a tmp param (like trowmax) AND confirm a 4-operand A5 EmitC/ISA path exists, backed by a real board test.

2 (blocking). The WithTmp relaxation silently loosens A2/A3 trowmin/trowprod, where tmp is real scratch.
verifyTRowReductionWithTmpCommon drops verifyTileBufSameElemType + verifyTileBufSameValidShape (and downgrades verifyVecTileCommon -> verifyVecTileStorage) for ALL arches. On master, trowmin/trowprod already used this helper (only trowmax/trowsum are newly migrated from the NoTmp helper). On A2/A3 these forward tmp to TROWMIN/TROWPROD(dst, src, tmp) as genuine scratch, and the deleted test trowprod_tmp_mismatch_invalid.py asserted that a mismatched tmp must fail. Now an undersized / wrong-dtype tmp passes verification on A2/A3 and reaches the intrinsic -> possible scratch overflow / type reinterpret. The PR's intent is A5-placeholder semantics, so please A5-gate the relaxation (keep the strict checks on A2/A3), or justify why A2/A3 reductions tolerate a non-matching tmp.

Please confirm

3. Row-reduction getEffects drop the A5 tmp WRITE with no justification comment. For trowmax/trowmin/trowsum/trowprod, EmitC still forwards tmp to TROWMAX(dst, src, tmp) on A5. Dropping the WRITE is correct only if the A5 ISA truly does not write tmp for these (the vpto template ignoring tmp is encouraging but does not cover the EmitC path). If any A5 reduction writes tmp as scratch, InsertSync now misses a WAR/WAW on it. tsel/txor got an explanatory comment; please add the same here and confirm against the A5 ISA.

Nits

ptodsl/examples/softmax_lowlevel.py still says it reconstructs softmax/softmax.pto, but flips tile_col to row_major while that kernel keeps blayout=col_major for the 8x1 reduction tiles (Codex P3).
PTO.cpp ~10313: the comment "(A5 tmp-form invariant is checked earlier ...)" is now stale -- the invariant it refers to was deleted in this PR.
verifyTRowReductionNoTmpCommon now has zero callers; it is marked [[maybe_unused]] rather than removed (gemini flagged this) -- please delete it.

zhangstevenunity · 2026-06-22T09:50:53Z

  let summary = "TROWEXPANDMAX: Row-wise broadcast max with per-row scalar vector.";
  let description = [{
-    pto-isa has overloads with/without tmp on A2/A3; A5 supports the 3-operand form only.
+    pto-isa has overloads with/without tmp; on A5 the tmp operand is accepted as a placeholder and does not add extra shape constraints.


A5 trowexpand* with tmp is now verifier-legal but un-lowerable on both A5 backends.

vpto: the A5 TileLang template is 3-param (src0, src1, dst) (lib/TileOps/trowexpandmax_template.py). ExpandTileOp emits one operand-spec per operand and expand_helper drops any kernel where len(parameters) != len(operand_specs) (expand_helper.py:335), so a 4-operand op hard-fails ExpandTileOp: "failed to instantiate tilelang template for pto.trowexpandmax" (ExpandTileOp.cpp:1204).

EmitC (default): PTORowExpandMaxToEmitC forwards tmp as a 4th arg -> TROWEXPANDMAX(dst, src0, src1, tmp), but the A5 board kernel is 3-operand (a5/.../trowexpandmax/main.cpp: LaunchTROWEXPANDMAX_*(src0, src1, dst, stream)).

This is the bug the deleted hasTmp && A5 guard in verifyTRowExpandReduceLikeOp prevented -- its comment said the form would otherwise "lower to the A2/A3 4-operand TROWEXPAND* call." The new tests only use --emit-pto-ir or FileCheck text, so they never exercise the lowering. Suggest keeping the A5 reject, or adding a tmp-taking vpto template plus a real 4-operand A5 path with a board test.

zhangstevenunity · 2026-06-22T09:50:53Z

                                                      StringRef elemTypeError) {
  if (failed(verifyRowReductionSrcLayout(op, srcTy, "src")) ||
-      failed(verifyVecTileCommon(op, tmpTy, "tmp")) ||
+      failed(verifyVecTileStorage(op, tmpTy, "tmp")) ||


Dropping verifyTileBufSameElemType + verifyTileBufSameValidShape here loosens tmp checks for ALL arches, but trowmin/trowprod already used this helper on master and forward tmp to TROWMIN/TROWPROD(dst, src, tmp) as real scratch on A2/A3. The deleted trowprod_tmp_mismatch_invalid.py asserted that a mismatched tmp must fail. Now an undersized / wrong-dtype tmp passes verification on A2/A3 and reaches the intrinsic (scratch overflow / type reinterpret). Please A5-gate this relaxation so A2/A3 keep the same-elem / same-valid-shape checks, or justify why A2/A3 reductions tolerate a non-matching tmp.

zhangstevenunity · 2026-06-22T09:50:53Z

    SmallVectorImpl<SideEffects::EffectInstance<MemoryEffects::Effect>> &effects) {
  PTO_ADD_READ(getSrcMutable());
-  PTO_ADD_WRITE(getTmpMutable());
+  if (getTargetArch(getOperation()) != PTOArch::A5)


This drops the A5 tmp WRITE for trowmax (same change for trowmin/trowsum/trowprod), but EmitC still forwards tmp to TROWMAX(dst, src, tmp) on A5. This is safe only if the A5 ISA does not write tmp for these reductions. If it does, InsertSync now misses a WAR/WAW on tmp. tsel/txor got an explanatory comment for the identical change; please add the same here and confirm against the A5 ISA (the vpto template ignoring tmp is supportive but does not cover the EmitC path).

zhangstevenunity · 2026-06-22T09:50:53Z

-            _col_cfg = pto.TileBufConfigAttr.get(
-                pto.BLayoutAttr.get(pto.BLayout.ColMajor),
+            _reduce_cfg = pto.TileBufConfigAttr.get(
+                pto.BLayoutAttr.get(pto.BLayout.RowMajor),


This flips tile_col to row_major, but softmax_lowlevel.py (line 13) still says it reconstructs test/tilelang_st/npu/a5/src/st/testcase/softmax/softmax.pto, where the 8x1 reduction tiles keep blayout=col_major. The example no longer matches the referenced kernel (Codex P3). Either revert this layout or update the reference/claim.

reedhecre · 2026-06-22T09:53:42Z

已接收 /run a3，A3 板测器会处理这条请求。

进度页：http://154.9.227.233/ptoas-board-dashboard/#board-a3
当前状态：当前正在执行 PR add op tinsert #765，你的请求已入队，前方还有 3 个任务。

页面会自动刷新，可以直接看当前阶段、排队情况和最近结果。

zhangstevenunity

Correction after reading the latest pto-isa (npu/a5 + npu/a2a3)

I checked the actual A5/A2A3 intrinsic implementations and need to retract two of my three blocking points -- the A5 placeholder design in this PR is more correct than my first pass implied. Apologies for the noise.

RETRACT my point 1, EmitC half (no board break). pto-isa include/pto/npu/a5/TRowExpand{Max,Add,Mul,Sub,Div,Min,Expdif}.hpp each provide a 4-arg TROWEXPAND*_IMPL(dst, src0, src1, tmp) overload, explicitly commented "4-arg overload for cross-architecture portability with A2/A3 ... the tmp tile is accepted and ignored." So the default EmitC path compiles and runs correctly on A5 -- there is no missing 4-operand ISA overload. My EmitC board-break claim was wrong.

RETRACT my point 3 (row-reduction A5 sync is fine). npu/a5/TRowReduce.hpp (TROWMAX/TROWSUM/TROWMIN_IMPL) and npu/a5/TRowProd.hpp take tmp and never touch it -- the reduction is pure register accumulation (vregdst). So dropping the A5 tmp WRITE in getEffects is correct and the false-vec-overflow fix is sound; no missing WAR/WAW.

DOWNGRADE my point 1, vpto half (now minor / non-blocking). Only the opt-in --pto-backend=vpto path is affected: the TileLang templates lib/TileOps/trowexpand*_template.py are still 3-param (src0, src1, dst), so ExpandTileOp hard-fails (loudly) on a 4-operand op (expand_helper.py:335 -> ExpandTileOp.cpp:1204). No current frontend emits trowexpand*+tmp, so this is latent. Suggest giving those templates a tmp param like trowmax_template.py already has (mirroring the ISA's 4-arg portability overload) -- a consistency follow-up, not a blocker.

STILL STANDS: my point 2 (A2/A3 trowmin/trowprod), now confirmed against the ISA. npu/a2a3/TRowMax.hpp and TRowProd.hpp use tmp as real read/write scratch: vector_dup(tmp, ...) then vmax/vmul(tmp, tmp, src + block*elemsPerBlock), with elemsPerBlock = BLOCK_BYTE_SIZE / sizeof(T) (8 for i32, 16 for i16), and the tile pointer is cast to T*. So A2/A3 needs tmp >= one block AND tmp dtype == src dtype. The relaxed verifyTRowReductionWithTmpCommon drops BOTH the same-elem-type check and any size check for all arches, so a sub-block / wrong-dtype tmp now passes verification on A2/A3 -> silent out-of-bounds scratch write. This is reachable in exactly the cross-arch "placeholder" scenario this PR promotes: a [1,8] tmp is fine for i32 (block=8) but overflows an i16 reduction (block=16). The old same-valid-shape check was over-strict (it demanded a full-src tmp, more than the one-block minimum) but it was safe.

Recommend A5-gating the relaxation -- keep same-elem-type + a >= one-block size check on A2/A3 -- instead of dropping the checks for all arches. If trowmin/trowprod callers are guaranteed to size tmp >= one block with matching dtype, point 2 is non-blocking too.

Net: the A5 placeholder modeling is sound and ISA-backed. The only residual correctness concern is the A2/A3 scratch checks in point 2.

reedhecre · 2026-06-23T06:26:27Z

A3 板测失败

触发方式：merged
源码提交：74354bf93cad
结果汇总：OK 152 / FAIL 69 / SKIP 1
日志：/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260623_135338_merged_pr824.log
失败阶段：board-validation / exit=1

失败用例

orchestration_example_kernel_add (run, exit=2)
vector_example_dag_kernel_add_scalar (run, exit=2)
paged_attention_example_kernel_pv_matmul (run, exit=2)
paged_attention_example_kernel_init_inplace (run, exit=2)
vector_example_dag_kernel_add (run, exit=2)
paged_attention_example_kernel_online_update (run, exit=2)
paged_attention_example_kernel_softmax_prepare (run, exit=2)
orchestration_example_kernel_add_scalar (run, exit=2)
paged_attention_example_kernel_qk_matmul (run, exit=2)
orchestration_example_kernel_mul (run, exit=2)
vector_example_dag_kernel_mul (run, exit=2)
rowexpanddiv (run, exit=2)
prelu (run, exit=2)
plan_memory_bind_tile_alias_liveness (run, exit=2)
plan_memory_peak_exact_capacity (run, exit=2)
plan_memory_loop_no_reuse_outer_live (run, exit=2)
plan_memory_if_yield (run, exit=2)
plan_memory_loop_in_if (run, exit=2)
plan_memory_peak_8_overlapping (run, exit=2)
plan_memory_if_in_loop (run, exit=2)
plan_memory_fragmentation_hole_fit (run, exit=2)
plan_memory_for_iter_args_yield (run, exit=2)
plan_memory_no_reuse_overlap (run, exit=2)
plan_memory_reuse_sequential (run, exit=2)
plan_memory_nested_loops (run, exit=2)
plan_memory_fragmentation_two_holes (run, exit=2)
rems (run, exit=2)
xor (run, exit=2)
partition_view_verify_rank_mismatch_valid (run, exit=2)
partition_view_verify_valid (run, exit=2)
rowexpandmul (run, exit=2)
quant_asym (run, exit=2)
quant (run, exit=2)
partition5d_dynamic (run, exit=2)
partition5d (run, exit=2)
scatter (run, exit=2)
sparse_attn_test_incore_7 (run, exit=2)
decode_hca_test_incore_54 (run, exit=2)
attention_swa_test_incore_40 (run, exit=2)
decode_swa_test_incore_40 (run, exit=2)
decode_csa_test_incore_81 (run, exit=2)
attention_hca_test_incore_54 (run, exit=2)
attention_csa_test_refresh_incore_81 (run, exit=2)
tensor_view_layout_dn (run, exit=2)
rowexpandsub (run, exit=2)
rope_kv_cache (run, exit=2)
qwen3_decode_incore_4 (run, exit=2)
post_rmsnorm (run, exit=2)
qwen3_decode_incore_1 (run, exit=2)
qwen3_decode_incore_10 (run, exit=2)
qwen3_decode_incore_11 (run, exit=2)
rmsnorm (run, exit=2)
qwen3_decode_incore_6 (run, exit=2)
qwen3_decode_incore_2 (run, exit=2)
qwen3_decode_incore_7 (run, exit=2)
qwen3_decode_incore_5 (run, exit=2)
qwen3_decode_incore_12 (run, exit=2)
sels (run, exit=2)
tprefetch_async_binding (run, exit=1)
test_barrier_sync (run, exit=2)
matmul (run, exit=2)
add_double_dynamic (run, exit=2)
nested_loop_confliect (run, exit=2)
rar_optimization_test (run, exit=2)
test_dynamic_valid_shape (run, exit=2)
test_auto_sync_tail_hint (run, exit=2)
compensation_test (run, exit=2)
rem (run, exit=2)
partmin (run, exit=2)

reedhecre · 2026-06-23T06:26:29Z

A3 板测失败详情：PR #824

orchestration_example_kernel_add

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_add/orchestration_example_kernel_add_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_add/orchestration_example_kernel_add_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_add/orchestration_example_kernel_add_kernel.cpp:113:3: error: no matching function for call to '__ptoas_kernel_add_impl'
  __ptoas_kernel_add_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_add/orchestration_example_kernel_add_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/orchestration_example_kernel_add_kernel.dir/build.make:76: CMakeFiles/orchestration_example_kernel_add_kernel.dir/orchestration_example_kernel_add_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/orchestration_example_kernel_add_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:07:37] ERROR: testcase failed (exit 2): orchestration_example_kernel_add

vector_example_dag_kernel_add_scalar

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_add_scalar/vector_example_dag_kernel_add_scalar_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_scalar_impl(__gm__ float* v1, float v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_add_scalar/vector_example_dag_kernel_add_scalar_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_scalar_impl(__gm__ float* v1, float v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_add_scalar/vector_example_dag_kernel_add_scalar_kernel.cpp:105:3: error: no matching function for call to '__ptoas_kernel_add_scalar_impl'
  __ptoas_kernel_add_scalar_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_add_scalar/vector_example_dag_kernel_add_scalar_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_scalar_impl(__gm__ float* v1, float v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/vector_example_dag_kernel_add_scalar_kernel.dir/build.make:76: CMakeFiles/vector_example_dag_kernel_add_scalar_kernel.dir/vector_example_dag_kernel_add_scalar_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/vector_example_dag_kernel_add_scalar_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:07:38] ERROR: testcase failed (exit 2): vector_example_dag_kernel_add_scalar

paged_attention_example_kernel_pv_matmul

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_pv_matmul/paged_attention_example_kernel_pv_matmul_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_kernel_pv_matmul_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_pv_matmul/paged_attention_example_kernel_pv_matmul_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_kernel_pv_matmul_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_pv_matmul/paged_attention_example_kernel_pv_matmul_kernel.cpp:124:3: error: no matching function for call to '__ptoas_kernel_pv_matmul_impl'
  __ptoas_kernel_pv_matmul_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_pv_matmul/paged_attention_example_kernel_pv_matmul_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_kernel_pv_matmul_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/paged_attention_example_kernel_pv_matmul_kernel.dir/build.make:76: CMakeFiles/paged_attention_example_kernel_pv_matmul_kernel.dir/paged_attention_example_kernel_pv_matmul_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/paged_attention_example_kernel_pv_matmul_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:07:40] ERROR: testcase failed (exit 2): paged_attention_example_kernel_pv_matmul

paged_attention_example_kernel_init_inplace

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_init_inplace/paged_attention_example_kernel_init_inplace_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_kernel_init_inplace_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_init_inplace/paged_attention_example_kernel_init_inplace_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_kernel_init_inplace_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_init_inplace/paged_attention_example_kernel_init_inplace_kernel.cpp:81:3: error: no matching function for call to '__ptoas_kernel_init_inplace_impl'
  __ptoas_kernel_init_inplace_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_init_inplace/paged_attention_example_kernel_init_inplace_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_kernel_init_inplace_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/paged_attention_example_kernel_init_inplace_kernel.dir/build.make:76: CMakeFiles/paged_attention_example_kernel_init_inplace_kernel.dir/paged_attention_example_kernel_init_inplace_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/paged_attention_example_kernel_init_inplace_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:07:41] ERROR: testcase failed (exit 2): paged_attention_example_kernel_init_inplace

vector_example_dag_kernel_add

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_add/vector_example_dag_kernel_add_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_add/vector_example_dag_kernel_add_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_add/vector_example_dag_kernel_add_kernel.cpp:113:3: error: no matching function for call to '__ptoas_kernel_add_impl'
  __ptoas_kernel_add_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_add/vector_example_dag_kernel_add_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/vector_example_dag_kernel_add_kernel.dir/build.make:76: CMakeFiles/vector_example_dag_kernel_add_kernel.dir/vector_example_dag_kernel_add_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/vector_example_dag_kernel_add_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:07:42] ERROR: testcase failed (exit 2): vector_example_dag_kernel_add

paged_attention_example_kernel_online_update

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_online_update/paged_attention_example_kernel_online_update_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_kernel_online_update_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, __gm__ float* v4, __gm__ float* v5, __gm__ float* v6, __gm__ float* v7, bool v8, bool v9) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_online_update/paged_attention_example_kernel_online_update_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_kernel_online_update_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, __gm__ float* v4, __gm__ float* v5, __gm__ float* v6, __gm__ float* v7, bool v8, bool v9) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_online_update/paged_attention_example_kernel_online_update_kernel.cpp:251:3: error: no matching function for call to '__ptoas_kernel_online_update_impl'
  __ptoas_kernel_online_update_impl(v1, v2, v3, v4, v5, v6, v7, v8, v9);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_online_update/paged_attention_example_kernel_online_update_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_kernel_online_update_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, __gm__ float* v4, __gm__ float* v5, __gm__ float* v6, __gm__ float* v7, bool v8, bool v9) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/paged_attention_example_kernel_online_update_kernel.dir/build.make:76: CMakeFiles/paged_attention_example_kernel_online_update_kernel.dir/paged_attention_example_kernel_online_update_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/paged_attention_example_kernel_online_update_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:07:44] ERROR: testcase failed (exit 2): paged_attention_example_kernel_online_update

paged_attention_example_kernel_softmax_prepare

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_softmax_prepare/paged_attention_example_kernel_softmax_prepare_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_kernel_softmax_prepare_impl(__gm__ float* v1, float v2, __gm__ bfloat16_t* v3, __gm__ float* v4, __gm__ float* v5) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_softmax_prepare/paged_attention_example_kernel_softmax_prepare_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_kernel_softmax_prepare_impl(__gm__ float* v1, float v2, __gm__ bfloat16_t* v3, __gm__ float* v4, __gm__ float* v5) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_softmax_prepare/paged_attention_example_kernel_softmax_prepare_kernel.cpp:146:3: error: no matching function for call to '__ptoas_kernel_softmax_prepare_impl'
  __ptoas_kernel_softmax_prepare_impl(v1, v2, v3, v4, v5);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_softmax_prepare/paged_attention_example_kernel_softmax_prepare_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_kernel_softmax_prepare_impl(__gm__ float* v1, float v2, __gm__ bfloat16_t* v3, __gm__ float* v4, __gm__ float* v5) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/paged_attention_example_kernel_softmax_prepare_kernel.dir/build.make:76: CMakeFiles/paged_attention_example_kernel_softmax_prepare_kernel.dir/paged_attention_example_kernel_softmax_prepare_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/paged_attention_example_kernel_softmax_prepare_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:07:45] ERROR: testcase failed (exit 2): paged_attention_example_kernel_softmax_prepare

orchestration_example_kernel_add_scalar

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_add_scalar/orchestration_example_kernel_add_scalar_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_scalar_impl(__gm__ float* v1, float v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_add_scalar/orchestration_example_kernel_add_scalar_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_scalar_impl(__gm__ float* v1, float v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_add_scalar/orchestration_example_kernel_add_scalar_kernel.cpp:105:3: error: no matching function for call to '__ptoas_kernel_add_scalar_impl'
  __ptoas_kernel_add_scalar_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_add_scalar/orchestration_example_kernel_add_scalar_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_scalar_impl(__gm__ float* v1, float v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/orchestration_example_kernel_add_scalar_kernel.dir/build.make:76: CMakeFiles/orchestration_example_kernel_add_scalar_kernel.dir/orchestration_example_kernel_add_scalar_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/orchestration_example_kernel_add_scalar_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:07:47] ERROR: testcase failed (exit 2): orchestration_example_kernel_add_scalar

paged_attention_example_kernel_qk_matmul

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_qk_matmul/paged_attention_example_kernel_qk_matmul_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_kernel_qk_matmul_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_qk_matmul/paged_attention_example_kernel_qk_matmul_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_kernel_qk_matmul_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_qk_matmul/paged_attention_example_kernel_qk_matmul_kernel.cpp:124:3: error: no matching function for call to '__ptoas_kernel_qk_matmul_impl'
  __ptoas_kernel_qk_matmul_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_qk_matmul/paged_attention_example_kernel_qk_matmul_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_kernel_qk_matmul_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/paged_attention_example_kernel_qk_matmul_kernel.dir/build.make:76: CMakeFiles/paged_attention_example_kernel_qk_matmul_kernel.dir/paged_attention_example_kernel_qk_matmul_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/paged_attention_example_kernel_qk_matmul_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:07:48] ERROR: testcase failed (exit 2): paged_attention_example_kernel_qk_matmul

orchestration_example_kernel_mul

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_mul/orchestration_example_kernel_mul_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_kernel_mul_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_mul/orchestration_example_kernel_mul_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_kernel_mul_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_mul/orchestration_example_kernel_mul_kernel.cpp:113:3: error: no matching function for call to '__ptoas_kernel_mul_impl'
  __ptoas_kernel_mul_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_mul/orchestration_example_kernel_mul_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_kernel_mul_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/orchestration_example_kernel_mul_kernel.dir/build.make:76: CMakeFiles/orchestration_example_kernel_mul_kernel.dir/orchestration_example_kernel_mul_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/orchestration_example_kernel_mul_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:07:49] ERROR: testcase failed (exit 2): orchestration_example_kernel_mul

vector_example_dag_kernel_mul

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_mul/vector_example_dag_kernel_mul_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_kernel_mul_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_mul/vector_example_dag_kernel_mul_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_kernel_mul_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_mul/vector_example_dag_kernel_mul_kernel.cpp:113:3: error: no matching function for call to '__ptoas_kernel_mul_impl'
  __ptoas_kernel_mul_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_mul/vector_example_dag_kernel_mul_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_kernel_mul_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/vector_example_dag_kernel_mul_kernel.dir/build.make:76: CMakeFiles/vector_example_dag_kernel_mul_kernel.dir/vector_example_dag_kernel_mul_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/vector_example_dag_kernel_mul_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:07:51] ERROR: testcase failed (exit 2): vector_example_dag_kernel_mul

rowexpanddiv

stage=run info=exit=2

[ERROR] Mismatch: golden_v3.bin vs v3.bin, max diff=18.47865390777588 at idx=963 (golden=10.328862190246582, out=-8.149791717529297, dtype=float32)
[ERROR] compare failed
[2026-06-23 14:12:07] ERROR: testcase failed (exit 2): rowexpanddiv

prelu

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Prelu/prelu/prelu_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_prelu_kernel_2d_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Prelu/prelu/prelu_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_prelu_kernel_2d_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Prelu/prelu/prelu_kernel.cpp:123:3: error: no matching function for call to '__ptoas_prelu_kernel_2d_impl'
  __ptoas_prelu_kernel_2d_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Prelu/prelu/prelu_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_prelu_kernel_2d_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/prelu_kernel.dir/build.make:76: CMakeFiles/prelu_kernel.dir/prelu_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/prelu_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:12:14] ERROR: testcase failed (exit 2): prelu

plan_memory_bind_tile_alias_liveness

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_bind_tile_alias_liveness/plan_memory_bind_tile_alias_liveness_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_bind_tile_alias_liveness_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_bind_tile_alias_liveness/plan_memory_bind_tile_alias_liveness_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_bind_tile_alias_liveness_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_bind_tile_alias_liveness/plan_memory_bind_tile_alias_liveness_kernel.cpp:127:3: error: no matching function for call to '__ptoas_bind_tile_alias_liveness_impl'
  __ptoas_bind_tile_alias_liveness_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_bind_tile_alias_liveness/plan_memory_bind_tile_alias_liveness_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_bind_tile_alias_liveness_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_bind_tile_alias_liveness_kernel.dir/build.make:76: CMakeFiles/plan_memory_bind_tile_alias_liveness_kernel.dir/plan_memory_bind_tile_alias_liveness_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_bind_tile_alias_liveness_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:21] ERROR: testcase failed (exit 2): plan_memory_bind_tile_alias_liveness

plan_memory_peak_exact_capacity

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_peak_exact_capacity/plan_memory_peak_exact_capacity_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_peak_exact_capacity_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_peak_exact_capacity/plan_memory_peak_exact_capacity_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_peak_exact_capacity_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_peak_exact_capacity/plan_memory_peak_exact_capacity_kernel.cpp:617:3: error: no matching function for call to '__ptoas_peak_exact_capacity_impl'
  __ptoas_peak_exact_capacity_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_peak_exact_capacity/plan_memory_peak_exact_capacity_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_peak_exact_capacity_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_peak_exact_capacity_kernel.dir/build.make:76: CMakeFiles/plan_memory_peak_exact_capacity_kernel.dir/plan_memory_peak_exact_capacity_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_peak_exact_capacity_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:22] ERROR: testcase failed (exit 2): plan_memory_peak_exact_capacity

plan_memory_loop_no_reuse_outer_live

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_loop_no_reuse_outer_live/plan_memory_loop_no_reuse_outer_live_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_loop_outer_live_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_loop_no_reuse_outer_live/plan_memory_loop_no_reuse_outer_live_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_loop_outer_live_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_loop_no_reuse_outer_live/plan_memory_loop_no_reuse_outer_live_kernel.cpp:135:3: error: no matching function for call to '__ptoas_loop_outer_live_impl'
  __ptoas_loop_outer_live_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_loop_no_reuse_outer_live/plan_memory_loop_no_reuse_outer_live_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_loop_outer_live_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_loop_no_reuse_outer_live_kernel.dir/build.make:76: CMakeFiles/plan_memory_loop_no_reuse_outer_live_kernel.dir/plan_memory_loop_no_reuse_outer_live_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_loop_no_reuse_outer_live_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:23] ERROR: testcase failed (exit 2): plan_memory_loop_no_reuse_outer_live

plan_memory_if_yield

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_if_yield/plan_memory_if_yield_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_if_yield_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_if_yield/plan_memory_if_yield_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_if_yield_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_if_yield/plan_memory_if_yield_kernel.cpp:130:3: error: no matching function for call to '__ptoas_if_yield_impl'
  __ptoas_if_yield_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_if_yield/plan_memory_if_yield_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_if_yield_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_if_yield_kernel.dir/build.make:76: CMakeFiles/plan_memory_if_yield_kernel.dir/plan_memory_if_yield_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_if_yield_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:25] ERROR: testcase failed (exit 2): plan_memory_if_yield

plan_memory_loop_in_if

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_loop_in_if/plan_memory_loop_in_if_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_loop_in_if_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_loop_in_if/plan_memory_loop_in_if_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_loop_in_if_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_loop_in_if/plan_memory_loop_in_if_kernel.cpp:138:3: error: no matching function for call to '__ptoas_loop_in_if_impl'
  __ptoas_loop_in_if_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_loop_in_if/plan_memory_loop_in_if_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_loop_in_if_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_loop_in_if_kernel.dir/build.make:76: CMakeFiles/plan_memory_loop_in_if_kernel.dir/plan_memory_loop_in_if_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_loop_in_if_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:26] ERROR: testcase failed (exit 2): plan_memory_loop_in_if

plan_memory_peak_8_overlapping

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_peak_8_overlapping/plan_memory_peak_8_overlapping_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_peak_8_overlapping_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_peak_8_overlapping/plan_memory_peak_8_overlapping_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_peak_8_overlapping_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_peak_8_overlapping/plan_memory_peak_8_overlapping_kernel.cpp:265:3: error: no matching function for call to '__ptoas_peak_8_overlapping_impl'
  __ptoas_peak_8_overlapping_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_peak_8_overlapping/plan_memory_peak_8_overlapping_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_peak_8_overlapping_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_peak_8_overlapping_kernel.dir/build.make:76: CMakeFiles/plan_memory_peak_8_overlapping_kernel.dir/plan_memory_peak_8_overlapping_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_peak_8_overlapping_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:27] ERROR: testcase failed (exit 2): plan_memory_peak_8_overlapping

plan_memory_if_in_loop

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_if_in_loop/plan_memory_if_in_loop_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_if_in_loop_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_if_in_loop/plan_memory_if_in_loop_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_if_in_loop_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_if_in_loop/plan_memory_if_in_loop_kernel.cpp:143:3: error: no matching function for call to '__ptoas_if_in_loop_impl'
  __ptoas_if_in_loop_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_if_in_loop/plan_memory_if_in_loop_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_if_in_loop_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_if_in_loop_kernel.dir/build.make:76: CMakeFiles/plan_memory_if_in_loop_kernel.dir/plan_memory_if_in_loop_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_if_in_loop_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:29] ERROR: testcase failed (exit 2): plan_memory_if_in_loop

plan_memory_fragmentation_hole_fit

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_fragmentation_hole_fit/plan_memory_fragmentation_hole_fit_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_fragmentation_hole_fit_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_fragmentation_hole_fit/plan_memory_fragmentation_hole_fit_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_fragmentation_hole_fit_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_fragmentation_hole_fit/plan_memory_fragmentation_hole_fit_kernel.cpp:610:3: error: no matching function for call to '__ptoas_fragmentation_hole_fit_impl'
  __ptoas_fragmentation_hole_fit_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_fragmentation_hole_fit/plan_memory_fragmentation_hole_fit_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_fragmentation_hole_fit_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_fragmentation_hole_fit_kernel.dir/build.make:76: CMakeFiles/plan_memory_fragmentation_hole_fit_kernel.dir/plan_memory_fragmentation_hole_fit_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_fragmentation_hole_fit_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:30] ERROR: testcase failed (exit 2): plan_memory_fragmentation_hole_fit

plan_memory_for_iter_args_yield

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_for_iter_args_yield/plan_memory_for_iter_args_yield_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_for_iter_args_yield_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_for_iter_args_yield/plan_memory_for_iter_args_yield_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_for_iter_args_yield_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_for_iter_args_yield/plan_memory_for_iter_args_yield_kernel.cpp:140:3: error: no matching function for call to '__ptoas_for_iter_args_yield_impl'
  __ptoas_for_iter_args_yield_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_for_iter_args_yield/plan_memory_for_iter_args_yield_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_for_iter_args_yield_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_for_iter_args_yield_kernel.dir/build.make:76: CMakeFiles/plan_memory_for_iter_args_yield_kernel.dir/plan_memory_for_iter_args_yield_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_for_iter_args_yield_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:31] ERROR: testcase failed (exit 2): plan_memory_for_iter_args_yield

plan_memory_no_reuse_overlap

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_no_reuse_overlap/plan_memory_no_reuse_overlap_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_no_reuse_overlap_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_no_reuse_overlap/plan_memory_no_reuse_overlap_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_no_reuse_overlap_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_no_reuse_overlap/plan_memory_no_reuse_overlap_kernel.cpp:127:3: error: no matching function for call to '__ptoas_no_reuse_overlap_impl'
  __ptoas_no_reuse_overlap_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_no_reuse_overlap/plan_memory_no_reuse_overlap_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_no_reuse_overlap_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_no_reuse_overlap_kernel.dir/build.make:76: CMakeFiles/plan_memory_no_reuse_overlap_kernel.dir/plan_memory_no_reuse_overlap_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_no_reuse_overlap_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:33] ERROR: testcase failed (exit 2): plan_memory_no_reuse_overlap

plan_memory_reuse_sequential

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_reuse_sequential/plan_memory_reuse_sequential_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_reuse_sequential_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_reuse_sequential/plan_memory_reuse_sequential_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_reuse_sequential_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_reuse_sequential/plan_memory_reuse_sequential_kernel.cpp:771:3: error: no matching function for call to '__ptoas_reuse_sequential_impl'
  __ptoas_reuse_sequential_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_reuse_sequential/plan_memory_reuse_sequential_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_reuse_sequential_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_reuse_sequential_kernel.dir/build.make:76: CMakeFiles/plan_memory_reuse_sequential_kernel.dir/plan_memory_reuse_sequential_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_reuse_sequential_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:34] ERROR: testcase failed (exit 2): plan_memory_reuse_sequential

plan_memory_nested_loops

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_nested_loops/plan_memory_nested_loops_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_nested_loops_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_nested_loops/plan_memory_nested_loops_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_nested_loops_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_nested_loops/plan_memory_nested_loops_kernel.cpp:167:3: error: no matching function for call to '__ptoas_nested_loops_impl'
  __ptoas_nested_loops_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_nested_loops/plan_memory_nested_loops_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_nested_loops_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_nested_loops_kernel.dir/build.make:76: CMakeFiles/plan_memory_nested_loops_kernel.dir/plan_memory_nested_loops_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_nested_loops_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:35] ERROR: testcase failed (exit 2): plan_memory_nested_loops

reedhecre · 2026-06-23T06:26:32Z

A3 板测失败详情：PR #824

plan_memory_fragmentation_two_holes

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_fragmentation_two_holes/plan_memory_fragmentation_two_holes_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_fragmentation_two_holes_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_fragmentation_two_holes/plan_memory_fragmentation_two_holes_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_fragmentation_two_holes_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_fragmentation_two_holes/plan_memory_fragmentation_two_holes_kernel.cpp:637:3: error: no matching function for call to '__ptoas_fragmentation_two_holes_impl'
  __ptoas_fragmentation_two_holes_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_fragmentation_two_holes/plan_memory_fragmentation_two_holes_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_fragmentation_two_holes_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_fragmentation_two_holes_kernel.dir/build.make:76: CMakeFiles/plan_memory_fragmentation_two_holes_kernel.dir/plan_memory_fragmentation_two_holes_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_fragmentation_two_holes_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:37] ERROR: testcase failed (exit 2): plan_memory_fragmentation_two_holes

rems

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Rems/rems/rems_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_rems_kernel_2d_impl(__gm__ float* v1, __gm__ float* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Rems/rems/rems_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_rems_kernel_2d_impl(__gm__ float* v1, __gm__ float* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Rems/rems/rems_kernel.cpp:116:3: error: no matching function for call to '__ptoas_rems_kernel_2d_impl'
  __ptoas_rems_kernel_2d_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Rems/rems/rems_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_rems_kernel_2d_impl(__gm__ float* v1, __gm__ float* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/rems_kernel.dir/build.make:76: CMakeFiles/rems_kernel.dir/rems_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/rems_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:38] ERROR: testcase failed (exit 2): rems

xor

stage=run info=exit=2

[ERROR] Mismatch: golden_v2.bin vs v2.bin, max diff=255.0 at idx=312 (golden=0, out=-255, dtype=int16)
[ERROR] compare failed
[2026-06-23 14:14:23] ERROR: testcase failed (exit 2): xor

partition_view_verify_rank_mismatch_valid

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PartitionView/partition_view_verify_rank_mismatch_valid/partition_view_verify_rank_mismatch_valid_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_partition_view_verify_rank_mismatch_valid_impl(__gm__ float* v1) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PartitionView/partition_view_verify_rank_mismatch_valid/partition_view_verify_rank_mismatch_valid_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_partition_view_verify_rank_mismatch_valid_impl(__gm__ float* v1) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PartitionView/partition_view_verify_rank_mismatch_valid/partition_view_verify_rank_mismatch_valid_kernel.cpp:81:3: error: no matching function for call to '__ptoas_partition_view_verify_rank_mismatch_valid_impl'
  __ptoas_partition_view_verify_rank_mismatch_valid_impl(v1);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PartitionView/partition_view_verify_rank_mismatch_valid/partition_view_verify_rank_mismatch_valid_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_partition_view_verify_rank_mismatch_valid_impl(__gm__ float* v1) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/partition_view_verify_rank_mismatch_valid_kernel.dir/build.make:76: CMakeFiles/partition_view_verify_rank_mismatch_valid_kernel.dir/partition_view_verify_rank_mismatch_valid_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/partition_view_verify_rank_mismatch_valid_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:16:26] ERROR: testcase failed (exit 2): partition_view_verify_rank_mismatch_valid

partition_view_verify_valid

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PartitionView/partition_view_verify_valid/partition_view_verify_valid_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_partition_view_verify_valid_impl(__gm__ float* v1) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PartitionView/partition_view_verify_valid/partition_view_verify_valid_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_partition_view_verify_valid_impl(__gm__ float* v1) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PartitionView/partition_view_verify_valid/partition_view_verify_valid_kernel.cpp:81:3: error: no matching function for call to '__ptoas_partition_view_verify_valid_impl'
  __ptoas_partition_view_verify_valid_impl(v1);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PartitionView/partition_view_verify_valid/partition_view_verify_valid_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_partition_view_verify_valid_impl(__gm__ float* v1) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/partition_view_verify_valid_kernel.dir/build.make:76: CMakeFiles/partition_view_verify_valid_kernel.dir/partition_view_verify_valid_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/partition_view_verify_valid_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:16:27] ERROR: testcase failed (exit 2): partition_view_verify_valid

rowexpandmul

stage=run info=exit=2

[ERROR] Mismatch: golden_v3.bin vs v3.bin, max diff=14.702873229980469 at idx=206 (golden=8.388465881347656, out=-6.3144073486328125, dtype=float32)
[ERROR] compare failed
[2026-06-23 14:17:55] ERROR: testcase failed (exit 2): rowexpandmul

quant_asym

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TQuant.hpp:53:58: error: member reference type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' is a pointer; did you mean to use '->'?
    TASSIGN_IMPL(src_s32, reinterpret_cast<uintptr_t>(tmp.data()));
                                                      ~~~^
                                                         ->
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:2290:5: note: in instantiation of function template specialization 'pto::TQUANT_IMPL<pto::QuantType::INT8_ASYM, pto::Tile<pto::TileType::Vec, unsigned char, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *>' requested here
    TQUANT_IMPL<quant_type, TileDataOut, TileDataSrc, TileDataPara>(dst, src, scale, offset);
    ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant_asym/quant_asym_kernel.cpp:117:3: note: in instantiation of function template specialization 'pto::TQUANT<pto::QuantType::INT8_ASYM, pto::Tile<pto::TileType::Vec, unsigned char, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>>' requested here
  TQUANT<pto::QuantType::INT8_ASYM, Tile<TileType::Vec, uint8_t, 32, 32, BLayout::RowMajor, 32, 32, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>, Tile<TileType::Vec, float, 32, 32, BLayout::RowMajor, 32, 32, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>, Tile<TileType::Vec, float, 32, 1, BLayout::ColMajor, 32, 1, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>>(v29, v23, v25, v31);
  ^
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant_asym/quant_asym_kernel.cpp:32:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/pto-inst.hpp:30:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:18:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr_impl.hpp:141:
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandMul.hpp:98:54: error: member reference type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' is a pointer; did you mean to use '->'?
            dst.data(), src0.data(), src1.data(), tmp.data(), validRow, validCol);
                                                  ~~~^
                                                     ->
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TQuant.hpp:42:5: note: in instantiation of function template specialization 'pto::TROWEXPANDMUL_IMPL<pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *>' requested here
    TROWEXPANDMUL_IMPL(src, src, scale, tmp);
    ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:2290:5: note: in instantiation of function template specialization 'pto::TQUANT_IMPL<pto::QuantType::INT8_ASYM, pto::Tile<pto::TileType::Vec, unsigned char, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *>' requested here
    TQUANT_IMPL<quant_type, TileDataOut, TileDataSrc, TileDataPara>(dst, src, scale, offset);
    ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant_asym/quant_asym_kernel.cpp:117:3: note: in instantiation of function template specialization 'pto::TQUANT<pto::QuantType::INT8_ASYM, pto::Tile<pto::TileType::Vec, unsigned char, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>>' requested here
  TQUANT<pto::QuantType::INT8_ASYM, Tile<TileType::Vec, uint8_t, 32, 32, BLayout::RowMajor, 32, 32, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>, Tile<TileType::Vec, float, 32, 32, BLayout::RowMajor, 32, 32, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>, Tile<TileType::Vec, float, 32, 1, BLayout::ColMajor, 32, 1, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>>(v29, v23, v25, v31);
  ^
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant_asym/quant_asym_kernel.cpp:32:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/pto-inst.hpp:30:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:18:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr_impl.hpp:141:
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandMul.hpp:97:9: error: no matching function for call to 'TRowExpandBin'
        TRowExpandBin<RowExpandMulOp<T>, TileDataDst, TileDataSrc0, TileDataSrc1, TileDataTmp>(
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:211:26: note: candidate template ignored: substitution failure [with Op = pto::RowExpandMulOp<float>, TileDataDst = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc0 = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc1 = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataTmp = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *]: type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' cannot be used prior to '::' because it has no members
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:188:26: note: candidate function template not viable: requires 5 arguments, but 6 were provided
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant_asym/quant_asym_kernel.cpp:32:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/pto-inst.hpp:30:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:18:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr_impl.hpp:141:
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandMul.hpp:103:54: error: member reference type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' is a pointer; did you mean to use '->'?
            dst.data(), src1.data(), src0.data(), tmp.data(), validRow, validCol);
                                                  ~~~^
                                                     ->
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandMul.hpp:102:9: error: no matching function for call to 'TRowExpandBin'
        TRowExpandBin<RowExpandMulOp<T>, TileDataDst, TileDataSrc1, TileDataSrc0, TileDataTmp>(
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:211:26: note: candidate template ignored: substitution failure [with Op = pto::RowExpandMulOp<float>, TileDataDst = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc0 = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc1 = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataTmp = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *]: type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' cannot be used prior to '::' because it has no members
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:188:26: note: candidate function template not viable: requires 5 arguments, but 6 were provided
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant_asym/quant_asym_kernel.cpp:32:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/pto-inst.hpp:30:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:18:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr_impl.hpp:137:
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandAdd.hpp:98:54: error: member reference type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' is a pointer; did you mean to use '->'?
            dst.data(), src0.data(), src1.data(), tmp.data(), validRow, validCol);
                                                  ~~~^
                                                     ->
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TQuant.hpp:45:9: note: in instantiation of function template specialization 'pto::TROWEXPANDADD_IMPL<pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *>' requested here
        TROWEXPANDADD_IMPL(src, src, *offset, tmp);
        ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:2290:5: note: in instantiation of function template specialization 'pto::TQUANT_IMPL<pto::QuantType::INT8_ASYM, pto::Tile<pto::TileType::Vec, unsigned char, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *>' requested here
    TQUANT_IMPL<quant_type, TileDataOut, TileDataSrc, TileDataPara>(dst, src, scale, offset);
    ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant_asym/quant_asym_kernel.cpp:117:3: note: in instantiation of function template specialization 'pto::TQUANT<pto::QuantType::INT8_ASYM, pto::Tile<pto::TileType::Vec, unsigned char, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>>' requested here
  TQUANT<pto::QuantType::INT8_ASYM, Tile<TileType::Vec, uint8_t, 32, 32, BLayout::RowMajor, 32, 32, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>, Tile<TileType::Vec, float, 32, 32, BLayout::RowMajor, 32, 32, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>, Tile<TileType::Vec, float, 32, 1, BLayout::ColMajor, 32, 1, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>>(v29, v23, v25, v31);
  ^
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant_asym/quant_asym_kernel.cpp:32:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/pto-inst.hpp:30:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:18:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr_impl.hpp:137:
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandAdd.hpp:97:9: error: no matching function for call to 'TRowExpandBin'
        TRowExpandBin<RowExpandAddOp<T>, TileDataDst, TileDataSrc0, TileDataSrc1, TileDataTmp>(
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:211:26: note: candidate template ignored: substitution failure [with Op = pto::RowExpandAddOp<float>, TileDataDst = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc0 = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc1 = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataTmp = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *]: type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' cannot be used prior to '::' because it has no members
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:188:26: note: candidate function template not viable: requires 5 arguments, but 6 were provided
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant_asym/quant_asym_kernel.cpp:32:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/pto-inst.hpp:30:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:18:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr_impl.hpp:137:
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandAdd.hpp:103:54: error: member reference type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' is a pointer; did you mean to use '->'?
            dst.data(), src1.data(), src0.data(), tmp.data(), validRow, validCol);
                                                  ~~~^
                                                     ->
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandAdd.hpp:102:9: error: no matching function for call to 'TRowExpandBin'
        TRowExpandBin<RowExpandAddOp<T>, TileDataDst, TileDataSrc1, TileDataSrc0, TileDataTmp>(
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:211:26: note: candidate template ignored: substitution failure [with Op = pto::RowExpandAddOp<float>, TileDataDst = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc0 = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc1 = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataTmp = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *]: type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' cannot be used prior to '::' because it has no members
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:188:26: note: candidate function template not viable: requires 5 arguments, but 6 were provided
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
9 errors generated.
gmake[2]: *** [CMakeFiles/quant_asym_kernel.dir/build.make:76: CMakeFiles/quant_asym_kernel.dir/quant_asym_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/quant_asym_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:18:05] ERROR: testcase failed (exit 2): quant_asym

quant

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TQuant.hpp:53:58: error: member reference type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' is a pointer; did you mean to use '->'?
    TASSIGN_IMPL(src_s32, reinterpret_cast<uintptr_t>(tmp.data()));
                                                      ~~~^
                                                         ->
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:2290:5: note: in instantiation of function template specialization 'pto::TQUANT_IMPL<pto::QuantType::INT8_SYM, pto::Tile<pto::TileType::Vec, signed char, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *>' requested here
    TQUANT_IMPL<quant_type, TileDataOut, TileDataSrc, TileDataPara>(dst, src, scale, offset);
    ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant/quant_kernel.cpp:108:3: note: in instantiation of function template specialization 'pto::TQUANT<pto::QuantType::INT8_SYM, pto::Tile<pto::TileType::Vec, signed char, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>>' requested here
  TQUANT<pto::QuantType::INT8_SYM, Tile<TileType::Vec, int8_t, 32, 32, BLayout::RowMajor, 32, 32, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>, Tile<TileType::Vec, float, 32, 32, BLayout::RowMajor, 32, 32, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>, Tile<TileType::Vec, float, 32, 1, BLayout::ColMajor, 32, 1, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>>(v22, v18, v20);
  ^
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant/quant_kernel.cpp:32:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/pto-inst.hpp:30:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:18:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr_impl.hpp:141:
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandMul.hpp:98:54: error: member reference type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' is a pointer; did you mean to use '->'?
            dst.data(), src0.data(), src1.data(), tmp.data(), validRow, validCol);
                                                  ~~~^
                                                     ->
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TQuant.hpp:42:5: note: in instantiation of function template specialization 'pto::TROWEXPANDMUL_IMPL<pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *>' requested here
    TROWEXPANDMUL_IMPL(src, src, scale, tmp);
    ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:2290:5: note: in instantiation of function template specialization 'pto::TQUANT_IMPL<pto::QuantType::INT8_SYM, pto::Tile<pto::TileType::Vec, signed char, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *>' requested here
    TQUANT_IMPL<quant_type, TileDataOut, TileDataSrc, TileDataPara>(dst, src, scale, offset);
    ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant/quant_kernel.cpp:108:3: note: in instantiation of function template specialization 'pto::TQUANT<pto::QuantType::INT8_SYM, pto::Tile<pto::TileType::Vec, signed char, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>>' requested here
  TQUANT<pto::QuantType::INT8_SYM, Tile<TileType::Vec, int8_t, 32, 32, BLayout::RowMajor, 32, 32, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>, Tile<TileType::Vec, float, 32, 32, BLayout::RowMajor, 32, 32, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>, Tile<TileType::Vec, float, 32, 1, BLayout::ColMajor, 32, 1, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>>(v22, v18, v20);
  ^
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant/quant_kernel.cpp:32:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/pto-inst.hpp:30:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:18:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr_impl.hpp:141:
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandMul.hpp:97:9: error: no matching function for call to 'TRowExpandBin'
        TRowExpandBin<RowExpandMulOp<T>, TileDataDst, TileDataSrc0, TileDataSrc1, TileDataTmp>(
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:211:26: note: candidate template ignored: substitution failure [with Op = pto::RowExpandMulOp<float>, TileDataDst = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc0 = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc1 = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataTmp = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *]: type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' cannot be used prior to '::' because it has no members
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:188:26: note: candidate function template not viable: requires 5 arguments, but 6 were provided
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant/quant_kernel.cpp:32:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/pto-inst.hpp:30:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:18:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr_impl.hpp:141:
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandMul.hpp:103:54: error: member reference type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' is a pointer; did you mean to use '->'?
            dst.data(), src1.data(), src0.data(), tmp.data(), validRow, validCol);
                                                  ~~~^
                                                     ->
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandMul.hpp:102:9: error: no matching function for call to 'TRowExpandBin'
        TRowExpandBin<RowExpandMulOp<T>, TileDataDst, TileDataSrc1, TileDataSrc0, TileDataTmp>(
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:211:26: note: candidate template ignored: substitution failure [with Op = pto::RowExpandMulOp<float>, TileDataDst = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc0 = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc1 = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataTmp = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *]: type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' cannot be used prior to '::' because it has no members
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:188:26: note: candidate function template not viable: requires 5 arguments, but 6 were provided
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
5 errors generated.
gmake[2]: *** [CMakeFiles/quant_kernel.dir/build.make:76: CMakeFiles/quant_kernel.dir/quant_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/quant_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:18:07] ERROR: testcase failed (exit 2): quant

partition5d_dynamic

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Partition5D/partition5d_dynamic/partition5d_dynamic_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_run_partition_impl(__gm__ float* v1, __gm__ float* v2, int64_t v3, int64_t v4, int64_t v5, int64_t v6, int64_t v7, int64_t v8, int64_t v9, int64_t v10, int64_t v11, int64_t v12) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Partition5D/partition5d_dynamic/partition5d_dynamic_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_run_partition_impl(__gm__ float* v1, __gm__ float* v2, int64_t v3, int64_t v4, int64_t v5, int64_t v6, int64_t v7, int64_t v8, int64_t v9, int64_t v10, int64_t v11, int64_t v12) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Partition5D/partition5d_dynamic/partition5d_dynamic_kernel.cpp:99:3: error: no matching function for call to '__ptoas_run_partition_impl'
  __ptoas_run_partition_impl(v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Partition5D/partition5d_dynamic/partition5d_dynamic_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_run_partition_impl(__gm__ float* v1, __gm__ float* v2, int64_t v3, int64_t v4, int64_t v5, int64_t v6, int64_t v7, int64_t v8, int64_t v9, int64_t v10, int64_t v11, int64_t v12) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/partition5d_dynamic_kernel.dir/build.make:76: CMakeFiles/partition5d_dynamic_kernel.dir/partition5d_dynamic_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/partition5d_dynamic_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:18:08] ERROR: testcase failed (exit 2): partition5d_dynamic

partition5d

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Partition5D/partition5d/partition5d_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_run_partition_impl(__gm__ float* v1, __gm__ float* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Partition5D/partition5d/partition5d_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_run_partition_impl(__gm__ float* v1, __gm__ float* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Partition5D/partition5d/partition5d_kernel.cpp:99:3: error: no matching function for call to '__ptoas_run_partition_impl'
  __ptoas_run_partition_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Partition5D/partition5d/partition5d_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_run_partition_impl(__gm__ float* v1, __gm__ float* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/partition5d_kernel.dir/build.make:76: CMakeFiles/partition5d_kernel.dir/partition5d_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/partition5d_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:18:09] ERROR: testcase failed (exit 2): partition5d

scatter

stage=run info=exit=2

[ERROR] Mismatch: golden_v3.bin vs v3.bin, max diff=4.940644979476929 at idx=27 (golden=2.725048542022705, out=-2.2155964374542236, dtype=float32)
[ERROR] compare failed
[2026-06-23 14:18:32] ERROR: testcase failed (exit 2): scatter

sparse_attn_test_incore_7

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/sparse_attn_test_incore_7/sparse_attn_test_incore_7_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_sparse_attn_test_incore_7_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/sparse_attn_test_incore_7/sparse_attn_test_incore_7_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_sparse_attn_test_incore_7_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/sparse_attn_test_incore_7/sparse_attn_test_incore_7_kernel.cpp:156:3: error: no matching function for call to '__ptoas_sparse_attn_test_incore_7_impl'
  __ptoas_sparse_attn_test_incore_7_impl(v1, v2, v3, v4, v5);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/sparse_attn_test_incore_7/sparse_attn_test_incore_7_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_sparse_attn_test_incore_7_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/sparse_attn_test_incore_7_kernel.dir/build.make:76: CMakeFiles/sparse_attn_test_incore_7_kernel.dir/sparse_attn_test_incore_7_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/sparse_attn_test_incore_7_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:19:29] ERROR: testcase failed (exit 2): sparse_attn_test_incore_7

decode_hca_test_incore_54

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_hca_test_incore_54/decode_hca_test_incore_54_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_decode_hca_test_incore_54_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_hca_test_incore_54/decode_hca_test_incore_54_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_decode_hca_test_incore_54_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_hca_test_incore_54/decode_hca_test_incore_54_kernel.cpp:156:3: error: no matching function for call to '__ptoas_decode_hca_test_incore_54_impl'
  __ptoas_decode_hca_test_incore_54_impl(v1, v2, v3, v4, v5);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_hca_test_incore_54/decode_hca_test_incore_54_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_decode_hca_test_incore_54_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/decode_hca_test_incore_54_kernel.dir/build.make:76: CMakeFiles/decode_hca_test_incore_54_kernel.dir/decode_hca_test_incore_54_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/decode_hca_test_incore_54_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:19:30] ERROR: testcase failed (exit 2): decode_hca_test_incore_54

attention_swa_test_incore_40

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_swa_test_incore_40/attention_swa_test_incore_40_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_attention_swa_test_incore_40_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_swa_test_incore_40/attention_swa_test_incore_40_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_attention_swa_test_incore_40_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_swa_test_incore_40/attention_swa_test_incore_40_kernel.cpp:156:3: error: no matching function for call to '__ptoas_attention_swa_test_incore_40_impl'
  __ptoas_attention_swa_test_incore_40_impl(v1, v2, v3, v4, v5);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_swa_test_incore_40/attention_swa_test_incore_40_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_attention_swa_test_incore_40_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/attention_swa_test_incore_40_kernel.dir/build.make:76: CMakeFiles/attention_swa_test_incore_40_kernel.dir/attention_swa_test_incore_40_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/attention_swa_test_incore_40_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:19:31] ERROR: testcase failed (exit 2): attention_swa_test_incore_40

reedhecre · 2026-06-23T06:26:35Z

A3 板测失败详情：PR #824

decode_swa_test_incore_40

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_swa_test_incore_40/decode_swa_test_incore_40_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_decode_swa_test_incore_40_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_swa_test_incore_40/decode_swa_test_incore_40_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_decode_swa_test_incore_40_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_swa_test_incore_40/decode_swa_test_incore_40_kernel.cpp:156:3: error: no matching function for call to '__ptoas_decode_swa_test_incore_40_impl'
  __ptoas_decode_swa_test_incore_40_impl(v1, v2, v3, v4, v5);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_swa_test_incore_40/decode_swa_test_incore_40_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_decode_swa_test_incore_40_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/decode_swa_test_incore_40_kernel.dir/build.make:76: CMakeFiles/decode_swa_test_incore_40_kernel.dir/decode_swa_test_incore_40_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/decode_swa_test_incore_40_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:19:33] ERROR: testcase failed (exit 2): decode_swa_test_incore_40

decode_csa_test_incore_81

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_csa_test_incore_81/decode_csa_test_incore_81_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_decode_csa_test_incore_81_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_csa_test_incore_81/decode_csa_test_incore_81_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_decode_csa_test_incore_81_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_csa_test_incore_81/decode_csa_test_incore_81_kernel.cpp:156:3: error: no matching function for call to '__ptoas_decode_csa_test_incore_81_impl'
  __ptoas_decode_csa_test_incore_81_impl(v1, v2, v3, v4, v5);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_csa_test_incore_81/decode_csa_test_incore_81_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_decode_csa_test_incore_81_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/decode_csa_test_incore_81_kernel.dir/build.make:76: CMakeFiles/decode_csa_test_incore_81_kernel.dir/decode_csa_test_incore_81_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/decode_csa_test_incore_81_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:19:34] ERROR: testcase failed (exit 2): decode_csa_test_incore_81

attention_hca_test_incore_54

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_hca_test_incore_54/attention_hca_test_incore_54_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_attention_hca_test_incore_54_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_hca_test_incore_54/attention_hca_test_incore_54_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_attention_hca_test_incore_54_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_hca_test_incore_54/attention_hca_test_incore_54_kernel.cpp:156:3: error: no matching function for call to '__ptoas_attention_hca_test_incore_54_impl'
  __ptoas_attention_hca_test_incore_54_impl(v1, v2, v3, v4, v5);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_hca_test_incore_54/attention_hca_test_incore_54_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_attention_hca_test_incore_54_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/attention_hca_test_incore_54_kernel.dir/build.make:76: CMakeFiles/attention_hca_test_incore_54_kernel.dir/attention_hca_test_incore_54_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/attention_hca_test_incore_54_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:19:35] ERROR: testcase failed (exit 2): attention_hca_test_incore_54

attention_csa_test_refresh_incore_81

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_csa_test_refresh_incore_81/attention_csa_test_refresh_incore_81_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_attention_csa_test_refresh_incore_81_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_csa_test_refresh_incore_81/attention_csa_test_refresh_incore_81_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_attention_csa_test_refresh_incore_81_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_csa_test_refresh_incore_81/attention_csa_test_refresh_incore_81_kernel.cpp:156:3: error: no matching function for call to '__ptoas_attention_csa_test_refresh_incore_81_impl'
  __ptoas_attention_csa_test_refresh_incore_81_impl(v1, v2, v3, v4, v5);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_csa_test_refresh_incore_81/attention_csa_test_refresh_incore_81_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_attention_csa_test_refresh_incore_81_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/attention_csa_test_refresh_incore_81_kernel.dir/build.make:76: CMakeFiles/attention_csa_test_refresh_incore_81_kernel.dir/attention_csa_test_refresh_incore_81_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/attention_csa_test_refresh_incore_81_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:19:37] ERROR: testcase failed (exit 2): attention_csa_test_refresh_incore_81

tensor_view_layout_dn

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Layout/tensor_view_layout_dn/tensor_view_layout_dn_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_run_impl(__gm__ float* v1, __gm__ float* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Layout/tensor_view_layout_dn/tensor_view_layout_dn_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_run_impl(__gm__ float* v1, __gm__ float* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Layout/tensor_view_layout_dn/tensor_view_layout_dn_kernel.cpp:97:3: error: no matching function for call to '__ptoas_run_impl'
  __ptoas_run_impl(v1, v2);
  ^~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Layout/tensor_view_layout_dn/tensor_view_layout_dn_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_run_impl(__gm__ float* v1, __gm__ float* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/tensor_view_layout_dn_kernel.dir/build.make:76: CMakeFiles/tensor_view_layout_dn_kernel.dir/tensor_view_layout_dn_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/tensor_view_layout_dn_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:19:38] ERROR: testcase failed (exit 2): tensor_view_layout_dn

rowexpandsub

stage=run info=exit=2

[ERROR] Mismatch: golden_v3.bin vs v3.bin, max diff=5.565198540687561 at idx=202 (golden=1.555970311164856, out=-4.009228229522705, dtype=float32)
[ERROR] compare failed
[2026-06-23 14:20:06] ERROR: testcase failed (exit 2): rowexpandsub

rope_kv_cache

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/rope_kv_cache/rope_kv_cache_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_rope_kv_cache_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ bfloat16_t* v3, __gm__ float* v4, __gm__ float* v5, __gm__ float* v6, __gm__ float* v7, __gm__ float* v8, __gm__ float* v9, __gm__ float* v10, int64_t v11, int64_t v12) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/rope_kv_cache/rope_kv_cache_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_rope_kv_cache_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ bfloat16_t* v3, __gm__ float* v4, __gm__ float* v5, __gm__ float* v6, __gm__ float* v7, __gm__ float* v8, __gm__ float* v9, __gm__ float* v10, int64_t v11, int64_t v12) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/rope_kv_cache/rope_kv_cache_kernel.cpp:336:3: error: no matching function for call to '__ptoas_rope_kv_cache_impl'
  __ptoas_rope_kv_cache_impl(v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/rope_kv_cache/rope_kv_cache_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_rope_kv_cache_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ bfloat16_t* v3, __gm__ float* v4, __gm__ float* v5, __gm__ float* v6, __gm__ float* v7, __gm__ float* v8, __gm__ float* v9, __gm__ float* v10, int64_t v11, int64_t v12) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/rope_kv_cache_kernel.dir/build.make:76: CMakeFiles/rope_kv_cache_kernel.dir/rope_kv_cache_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/rope_kv_cache_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:07] ERROR: testcase failed (exit 2): rope_kv_cache

qwen3_decode_incore_4

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_4/qwen3_decode_incore_4_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_4_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ bfloat16_t* v3, int64_t v4, int64_t v5, int32_t v6, int32_t v7) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_4/qwen3_decode_incore_4_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_4_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ bfloat16_t* v3, int64_t v4, int64_t v5, int32_t v6, int32_t v7) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_4/qwen3_decode_incore_4_kernel.cpp:234:3: error: no matching function for call to '__ptoas_qwen3_decode_incore_4_impl'
  __ptoas_qwen3_decode_incore_4_impl(v1, v2, v3, v4, v5, v6, v7);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_4/qwen3_decode_incore_4_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_4_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ bfloat16_t* v3, int64_t v4, int64_t v5, int32_t v6, int32_t v7) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_4_kernel.dir/build.make:76: CMakeFiles/qwen3_decode_incore_4_kernel.dir/qwen3_decode_incore_4_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/qwen3_decode_incore_4_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:09] ERROR: testcase failed (exit 2): qwen3_decode_incore_4

post_rmsnorm

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/post_rmsnorm/post_rmsnorm_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_post_rmsnorm_impl(__gm__ float* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/post_rmsnorm/post_rmsnorm_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_post_rmsnorm_impl(__gm__ float* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/post_rmsnorm/post_rmsnorm_kernel.cpp:302:3: error: no matching function for call to '__ptoas_post_rmsnorm_impl'
  __ptoas_post_rmsnorm_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/post_rmsnorm/post_rmsnorm_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_post_rmsnorm_impl(__gm__ float* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/post_rmsnorm_kernel.dir/build.make:76: CMakeFiles/post_rmsnorm_kernel.dir/post_rmsnorm_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/post_rmsnorm_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:10] ERROR: testcase failed (exit 2): post_rmsnorm

qwen3_decode_incore_1

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_1/qwen3_decode_incore_1_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_1_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_1/qwen3_decode_incore_1_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_1_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_1/qwen3_decode_incore_1_kernel.cpp:258:3: error: no matching function for call to '__ptoas_qwen3_decode_incore_1_impl'
  __ptoas_qwen3_decode_incore_1_impl(v1, v2, v3, v4, v5);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_1/qwen3_decode_incore_1_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_1_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_1_kernel.dir/build.make:76: CMakeFiles/qwen3_decode_incore_1_kernel.dir/qwen3_decode_incore_1_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/qwen3_decode_incore_1_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:25] ERROR: testcase failed (exit 2): qwen3_decode_incore_1

qwen3_decode_incore_10

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_10/qwen3_decode_incore_10_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_10_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3, int64_t v4, int32_t v5, int32_t v6) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_10/qwen3_decode_incore_10_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_10_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3, int64_t v4, int32_t v5, int32_t v6) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_10/qwen3_decode_incore_10_kernel.cpp:341:3: error: no matching function for call to '__ptoas_qwen3_decode_incore_10_impl'
  __ptoas_qwen3_decode_incore_10_impl(v1, v2, v3, v4, v5, v6);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_10/qwen3_decode_incore_10_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_10_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3, int64_t v4, int32_t v5, int32_t v6) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_10_kernel.dir/build.make:76: CMakeFiles/qwen3_decode_incore_10_kernel.dir/qwen3_decode_incore_10_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/qwen3_decode_incore_10_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:26] ERROR: testcase failed (exit 2): qwen3_decode_incore_10

qwen3_decode_incore_11

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_11/qwen3_decode_incore_11_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_11_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3, int64_t v4, int32_t v5, int32_t v6) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_11/qwen3_decode_incore_11_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_11_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3, int64_t v4, int32_t v5, int32_t v6) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_11/qwen3_decode_incore_11_kernel.cpp:341:3: error: no matching function for call to '__ptoas_qwen3_decode_incore_11_impl'
  __ptoas_qwen3_decode_incore_11_impl(v1, v2, v3, v4, v5, v6);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_11/qwen3_decode_incore_11_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_11_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3, int64_t v4, int32_t v5, int32_t v6) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_11_kernel.dir/build.make:76: CMakeFiles/qwen3_decode_incore_11_kernel.dir/qwen3_decode_incore_11_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/qwen3_decode_incore_11_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:28] ERROR: testcase failed (exit 2): qwen3_decode_incore_11

rmsnorm

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/rmsnorm/rmsnorm_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_rmsnorm_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/rmsnorm/rmsnorm_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_rmsnorm_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/rmsnorm/rmsnorm_kernel.cpp:508:3: error: no matching function for call to '__ptoas_rmsnorm_impl'
  __ptoas_rmsnorm_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/rmsnorm/rmsnorm_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_rmsnorm_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/rmsnorm_kernel.dir/build.make:76: CMakeFiles/rmsnorm_kernel.dir/rmsnorm_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/rmsnorm_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:29] ERROR: testcase failed (exit 2): rmsnorm

qwen3_decode_incore_6

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_6/qwen3_decode_incore_6_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_6_impl(__gm__ float* v1, __gm__ bfloat16_t* v2, __gm__ bfloat16_t* v3, int64_t v4, int64_t v5, int32_t v6, int32_t v7) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_6/qwen3_decode_incore_6_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_6_impl(__gm__ float* v1, __gm__ bfloat16_t* v2, __gm__ bfloat16_t* v3, int64_t v4, int64_t v5, int32_t v6, int32_t v7) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_6/qwen3_decode_incore_6_kernel.cpp:241:3: error: no matching function for call to '__ptoas_qwen3_decode_incore_6_impl'
  __ptoas_qwen3_decode_incore_6_impl(v1, v2, v3, v4, v5, v6, v7);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_6/qwen3_decode_incore_6_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_6_impl(__gm__ float* v1, __gm__ bfloat16_t* v2, __gm__ bfloat16_t* v3, int64_t v4, int64_t v5, int32_t v6, int32_t v7) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_6_kernel.dir/build.make:76: CMakeFiles/qwen3_decode_incore_6_kernel.dir/qwen3_decode_incore_6_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/qwen3_decode_incore_6_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:30] ERROR: testcase failed (exit 2): qwen3_decode_incore_6

qwen3_decode_incore_2

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_2/qwen3_decode_incore_2_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_2_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ bfloat16_t* v3, __gm__ float* v4, __gm__ float* v5, int32_t v6, int32_t v7) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_2/qwen3_decode_incore_2_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_2_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ bfloat16_t* v3, __gm__ float* v4, __gm__ float* v5, int32_t v6, int32_t v7) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_2/qwen3_decode_incore_2_kernel.cpp:381:3: error: no matching function for call to '__ptoas_qwen3_decode_incore_2_impl'
  __ptoas_qwen3_decode_incore_2_impl(v1, v2, v3, v4, v5, v6, v7);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_2/qwen3_decode_incore_2_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_2_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ bfloat16_t* v3, __gm__ float* v4, __gm__ float* v5, int32_t v6, int32_t v7) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_2_kernel.dir/build.make:76: CMakeFiles/qwen3_decode_incore_2_kernel.dir/qwen3_decode_incore_2_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/qwen3_decode_incore_2_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:32] ERROR: testcase failed (exit 2): qwen3_decode_incore_2

qwen3_decode_incore_7

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_7/qwen3_decode_incore_7_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_7_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, __gm__ bfloat16_t* v4, int64_t v5, int32_t v6, int32_t v7) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_7/qwen3_decode_incore_7_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_7_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, __gm__ bfloat16_t* v4, int64_t v5, int32_t v6, int32_t v7) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_7/qwen3_decode_incore_7_kernel.cpp:497:3: error: no matching function for call to '__ptoas_qwen3_decode_incore_7_impl'
  __ptoas_qwen3_decode_incore_7_impl(v1, v2, v3, v4, v5, v6, v7);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_7/qwen3_decode_incore_7_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_7_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, __gm__ bfloat16_t* v4, int64_t v5, int32_t v6, int32_t v7) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_7_kernel.dir/build.make:76: CMakeFiles/qwen3_decode_incore_7_kernel.dir/qwen3_decode_incore_7_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/qwen3_decode_incore_7_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:33] ERROR: testcase failed (exit 2): qwen3_decode_incore_7

qwen3_decode_incore_5

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_5/qwen3_decode_incore_5_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_5_impl(__gm__ float* v1, __gm__ float* v2, __gm__ bfloat16_t* v3, __gm__ float* v4, int64_t v5, int32_t v6, int32_t v7, int32_t v8) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_5/qwen3_decode_incore_5_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_5_impl(__gm__ float* v1, __gm__ float* v2, __gm__ bfloat16_t* v3, __gm__ float* v4, int64_t v5, int32_t v6, int32_t v7, int32_t v8) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_5/qwen3_decode_incore_5_kernel.cpp:277:3: error: no matching function for call to '__ptoas_qwen3_decode_incore_5_impl'
  __ptoas_qwen3_decode_incore_5_impl(v1, v2, v3, v4, v5, v6, v7, v8);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_5/qwen3_decode_incore_5_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_5_impl(__gm__ float* v1, __gm__ float* v2, __gm__ bfloat16_t* v3, __gm__ float* v4, int64_t v5, int32_t v6, int32_t v7, int32_t v8) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_5_kernel.dir/build.make:76: CMakeFiles/qwen3_decode_incore_5_kernel.dir/qwen3_decode_incore_5_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/qwen3_decode_incore_5_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:34] ERROR: testcase failed (exit 2): qwen3_decode_incore_5

qwen3_decode_incore_12

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_12/qwen3_decode_incore_12_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_12_impl(__gm__ float* v1, __gm__ float* v2, __gm__ bfloat16_t* v3, int64_t v4, int32_t v5, int32_t v6) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_12/qwen3_decode_incore_12_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_12_impl(__gm__ float* v1, __gm__ float* v2, __gm__ bfloat16_t* v3, int64_t v4, int32_t v5, int32_t v6) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_12/qwen3_decode_incore_12_kernel.cpp:158:3: error: no matching function for call to '__ptoas_qwen3_decode_incore_12_impl'
  __ptoas_qwen3_decode_incore_12_impl(v1, v2, v3, v4, v5, v6);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_12/qwen3_decode_incore_12_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_12_impl(__gm__ float* v1, __gm__ float* v2, __gm__ bfloat16_t* v3, int64_t v4, int32_t v5, int32_t v6) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_12_kernel.dir/build.make:76: CMakeFiles/qwen3_decode_incore_12_kernel.dir/qwen3_decode_incore_12_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/qwen3_decode_incore_12_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:36] ERROR: testcase failed (exit 2): qwen3_decode_incore_12

sels

stage=run info=exit=2

[ERROR] Mismatch: golden_v3.bin vs v3.bin, max diff=66.99866366386414 at idx=92 (golden=-2.9986636638641357, out=64.0, dtype=float32)
[ERROR] compare failed
[2026-06-23 14:20:40] ERROR: testcase failed (exit 2): sels

tprefetch_async_binding

stage=run info=exit=1

[SDMA] aclrtSynchronizeStream (aicpu) failed
[WARN] SdmaWorkspaceManager::Init failed - TPREFETCH_ASYNC will fall back to no-op prefetch
[ERROR] aclrtSynchronizeStream(stream) failed: 507018 (/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/TPrefetchAsync/tprefetch_async_binding/main.cpp:132)
[ERROR] RecentErrMsg: E39999: Inner Error!
E39999[PID: 1369173] 2026-06-23-14:21:29.830.521 (E39999):  The error from device(chipId:0, dieId:0), serial number is 72, an exception occurred during AICPU execution, stream_id:45, task_id:0, errcode:0, msg:aicpu execute failed.[FUNC:ProcessStarsAicpuErrorInfo][FILE:device_error_proc.cc][LINE:1644]
        TraceBack (most recent call last):
       Kernel task happen error, retCode=0x2a, [aicpu exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1729]
       Aicpu kernel execute failed, device_id=0, stream_id=45, task_id=0, soName=libcpu_kernels.so, funcName=RunCpuKernel, kernelName=ShmemSdmaStarsQuery, errorCode=0x2a.[FUNC:PrintAicpuErrorInfo][FILE:davinci_kernel_task.cc][LINE:1435]
       rtStreamSynchronize execution failed, reason=aicpu exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507018[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
       Failed to submit kernel task, retCode=0x715002a.[FUNC:LaunchKernelSubmit][FILE:context.cc][LINE:1223]
       kernel launch submit failed.[FUNC:LaunchKernel][FILE:context.cc][LINE:1349]
       rtKernelLaunch execution failed, reason=aicpu exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
[2026-06-23 14:24:07] ERROR: testcase failed (exit 1): tprefetch_async_binding

test_barrier_sync

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_barrier_sync/test_barrier_sync_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_test_barrier_sync_py_impl() {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_barrier_sync/test_barrier_sync_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_test_barrier_sync_py_impl() {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_barrier_sync/test_barrier_sync_kernel.cpp:84:3: error: no matching function for call to '__ptoas_test_barrier_sync_py_impl'
  __ptoas_test_barrier_sync_py_impl();
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_barrier_sync/test_barrier_sync_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_test_barrier_sync_py_impl() {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/test_barrier_sync_kernel.dir/build.make:76: CMakeFiles/test_barrier_sync_kernel.dir/test_barrier_sync_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/test_barrier_sync_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:24:31] ERROR: testcase failed (exit 2): test_barrier_sync

matmul

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/matmul/matmul_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_RunTMATMULSplitK_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, __gm__ float* v4, bool v5) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/matmul/matmul_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_RunTMATMULSplitK_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, __gm__ float* v4, bool v5) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/matmul/matmul_kernel.cpp:175:3: error: no matching function for call to '__ptoas_RunTMATMULSplitK_impl'
  __ptoas_RunTMATMULSplitK_impl(v1, v2, v3, v4, v5);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/matmul/matmul_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_RunTMATMULSplitK_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, __gm__ float* v4, bool v5) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/matmul_kernel.dir/build.make:76: CMakeFiles/matmul_kernel.dir/matmul_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/matmul_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:25:07] ERROR: testcase failed (exit 2): matmul

add_double_dynamic

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/add_double_dynamic/add_double_dynamic_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_vec_add_1d_dynamic_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/add_double_dynamic/add_double_dynamic_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_vec_add_1d_dynamic_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/add_double_dynamic/add_double_dynamic_kernel.cpp:183:3: error: no matching function for call to '__ptoas_vec_add_1d_dynamic_impl'
  __ptoas_vec_add_1d_dynamic_impl(v1, v2, v3, v4);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/add_double_dynamic/add_double_dynamic_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_vec_add_1d_dynamic_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/add_double_dynamic_kernel.dir/build.make:76: CMakeFiles/add_double_dynamic_kernel.dir/add_double_dynamic_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/add_double_dynamic_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:25:08] ERROR: testcase failed (exit 2): add_double_dynamic

nested_loop_confliect

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/nested_loop_confliect/nested_loop_confliect_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_nested_loop_sync_impl(__gm__ float* v1, __gm__ float* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/nested_loop_confliect/nested_loop_confliect_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_nested_loop_sync_impl(__gm__ float* v1, __gm__ float* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/nested_loop_confliect/nested_loop_confliect_kernel.cpp:111:3: error: no matching function for call to '__ptoas_nested_loop_sync_impl'
  __ptoas_nested_loop_sync_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/nested_loop_confliect/nested_loop_confliect_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_nested_loop_sync_impl(__gm__ float* v1, __gm__ float* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/nested_loop_confliect_kernel.dir/build.make:76: CMakeFiles/nested_loop_confliect_kernel.dir/nested_loop_confliect_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/nested_loop_confliect_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:25:10] ERROR: testcase failed (exit 2): nested_loop_confliect

rar_optimization_test

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/rar_optimization_test/rar_optimization_test_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_rar_hazard_check_impl(__gm__ float* v1, __gm__ float* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/rar_optimization_test/rar_optimization_test_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_rar_hazard_check_impl(__gm__ float* v1, __gm__ float* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/rar_optimization_test/rar_optimization_test_kernel.cpp:100:3: error: no matching function for call to '__ptoas_rar_hazard_check_impl'
  __ptoas_rar_hazard_check_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/rar_optimization_test/rar_optimization_test_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_rar_hazard_check_impl(__gm__ float* v1, __gm__ float* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/rar_optimization_test_kernel.dir/build.make:76: CMakeFiles/rar_optimization_test_kernel.dir/rar_optimization_test_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/rar_optimization_test_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:25:11] ERROR: testcase failed (exit 2): rar_optimization_test

test_dynamic_valid_shape

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_dynamic_valid_shape/test_dynamic_valid_shape_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_test_dynamic_valid_shape_impl(__gm__ float* v1, __gm__ float* v2, int32_t v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_dynamic_valid_shape/test_dynamic_valid_shape_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_test_dynamic_valid_shape_impl(__gm__ float* v1, __gm__ float* v2, int32_t v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_dynamic_valid_shape/test_dynamic_valid_shape_kernel.cpp:114:3: error: no matching function for call to '__ptoas_test_dynamic_valid_shape_impl'
  __ptoas_test_dynamic_valid_shape_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_dynamic_valid_shape/test_dynamic_valid_shape_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_test_dynamic_valid_shape_impl(__gm__ float* v1, __gm__ float* v2, int32_t v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/test_dynamic_valid_shape_kernel.dir/build.make:76: CMakeFiles/test_dynamic_valid_shape_kernel.dir/test_dynamic_valid_shape_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/test_dynamic_valid_shape_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:25:20] ERROR: testcase failed (exit 2): test_dynamic_valid_shape

test_auto_sync_tail_hint

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_auto_sync_tail_hint/test_auto_sync_tail_hint_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_test_auto_sync_tail_hint_impl(__gm__ float* v1, __gm__ float* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_auto_sync_tail_hint/test_auto_sync_tail_hint_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_test_auto_sync_tail_hint_impl(__gm__ float* v1, __gm__ float* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_auto_sync_tail_hint/test_auto_sync_tail_hint_kernel.cpp:100:3: error: no matching function for call to '__ptoas_test_auto_sync_tail_hint_impl'
  __ptoas_test_auto_sync_tail_hint_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_auto_sync_tail_hint/test_auto_sync_tail_hint_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_test_auto_sync_tail_hint_impl(__gm__ float* v1, __gm__ float* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/test_auto_sync_tail_hint_kernel.dir/build.make:76: CMakeFiles/test_auto_sync_tail_hint_kernel.dir/test_auto_sync_tail_hint_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/test_auto_sync_tail_hint_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:25:30] ERROR: testcase failed (exit 2): test_auto_sync_tail_hint

reedhecre · 2026-06-23T06:26:39Z

A3 板测失败详情：PR #824

compensation_test

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/compensation_test/compensation_test_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_compensation_check_impl(__gm__ float* v1, bool v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/compensation_test/compensation_test_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_compensation_check_impl(__gm__ float* v1, bool v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/compensation_test/compensation_test_kernel.cpp:97:3: error: no matching function for call to '__ptoas_compensation_check_impl'
  __ptoas_compensation_check_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/compensation_test/compensation_test_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_compensation_check_impl(__gm__ float* v1, bool v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/compensation_test_kernel.dir/build.make:76: CMakeFiles/compensation_test_kernel.dir/compensation_test_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/compensation_test_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:25:52] ERROR: testcase failed (exit 2): compensation_test

rem

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Rem/rem/rem_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_rem_kernel_2d_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Rem/rem/rem_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_rem_kernel_2d_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Rem/rem/rem_kernel.cpp:123:3: error: no matching function for call to '__ptoas_rem_kernel_2d_impl'
  __ptoas_rem_kernel_2d_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Rem/rem/rem_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_rem_kernel_2d_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/rem_kernel.dir/build.make:76: CMakeFiles/rem_kernel.dir/rem_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/rem_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:26:07] ERROR: testcase failed (exit 2): rem

partmin

stage=run info=exit=2

[ERROR] Mismatch: golden_v3.bin vs v3.bin, max diff=nan at idx=112 (golden=-0.0, out=nan, dtype=float16)
[ERROR] compare failed
[2026-06-23 14:26:12] ERROR: testcase failed (exit 2): partmin

gemini-code-assist Bot reviewed Jun 16, 2026

View reviewed changes

FangRui0 force-pushed the refactor_tmp branch 4 times, most recently from b63bf80 to e9b88fa Compare June 18, 2026 06:47

FangRui0 added 3 commits June 22, 2026 14:12

Align tmp contracts for P1/P2 tile ops

fbcf49a

Fix auto row-reduction tmp layout

1814e43

Fix row reduction auto tmp metadata

ec88d24

FangRui0 force-pushed the refactor_tmp branch from adeae8f to 527d8ac Compare June 22, 2026 06:12

fix: delete tmp constraits of trowxxx

0f93917

Signed-off-by: FangRui <fangrui_95@163.com>

FangRui0 force-pushed the refactor_tmp branch from 527d8ac to 0f93917 Compare June 22, 2026 07:36

zhangstevenunity requested changes Jun 22, 2026

View reviewed changes

zhangstevenunity reviewed Jun 22, 2026

View reviewed changes

zhangstevenunity merged commit 74354bf into hw-native-sys:main Jun 23, 2026
10 checks passed

Conversation

FangRui0 commented Jun 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

FangRui0 Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

reedhecre commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codex Review

Summary

Findings

Uh oh!

FangRui0 commented Jun 22, 2026

Uh oh!

reedhecre commented Jun 22, 2026

Uh oh!

reedhecre commented Jun 22, 2026

A3 板测失败

日志尾部

Uh oh!

FangRui0 commented Jun 22, 2026

Uh oh!

zhangstevenunity left a comment

Choose a reason for hiding this comment

Review: Align tmp contracts for tile ops

Requesting changes

Please confirm

Nits

Uh oh!

zhangstevenunity Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

zhangstevenunity Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

zhangstevenunity Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

zhangstevenunity Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

reedhecre commented Jun 22, 2026

Uh oh!

zhangstevenunity left a comment

Choose a reason for hiding this comment

Correction after reading the latest pto-isa (npu/a5 + npu/a2a3)

Uh oh!

Uh oh!

reedhecre commented Jun 23, 2026

A3 板测失败

失败用例

Uh oh!

reedhecre commented Jun 23, 2026

A3 板测失败详情：PR #824

Uh oh!

reedhecre commented Jun 23, 2026

A3 板测失败详情：PR #824

Uh oh!

reedhecre commented Jun 23, 2026

A3 板测失败详情：PR #824

Uh oh!

reedhecre commented Jun 23, 2026

A3 板测失败详情：PR #824

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

reedhecre commented Jun 16, 2026 •

edited

Loading