Skip to content

Align tmp contracts for tile ops#824

Merged
zhangstevenunity merged 4 commits into
hw-native-sys:mainfrom
FangRui0:refactor_tmp
Jun 23, 2026
Merged

Align tmp contracts for tile ops#824
zhangstevenunity merged 4 commits into
hw-native-sys:mainfrom
FangRui0:refactor_tmp

Conversation

@FangRui0

Copy link
Copy Markdown
Contributor

No description provided.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates several PTO IR operations to relax constraints on the tmp (temporary workspace) operand, particularly on the A5 architecture where it is treated as a placeholder that does not need to match the shape or layout of the source/destination tiles. This prevents false vector overflow diagnostics and updates memory effects. The review feedback points out a mismatch in the expected error message in a new test file and recommends completely removing the unused verifyTRowReductionNoTmpCommon helper function in PTO.cpp to avoid dead code.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

}
}

// CHECK: error: 'pto.trowmax' op expects tmp to use the row_major blayout

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The expected error message in the CHECK directive does not match the actual error message emitted by the verifier in lib/PTO/IR/PTO.cpp. The verifier emits "expects tmp to use row-major layout", but the test expects "expects tmp to use the row_major blayout". This mismatch will cause the lit test to fail.

// CHECK: error: 'pto.trowmax' op expects tmp to use row-major layout

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread lib/PTO/IR/PTO.cpp
Comment on lines +1875 to +1877
[[maybe_unused]] static LogicalResult
verifyTRowReductionNoTmpCommon(Operation *op, Type srcTy, Type dstTy,
StringRef elemTypeError) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The helper function verifyTRowReductionNoTmpCommon is now marked [[maybe_unused]] because all of its callers (TRowMaxOp and TRowSumOp) have been migrated to use verifyTRowReductionWithTmpCommon. Since this is a static helper function in this translation unit and is no longer used anywhere, it should be completely removed from the codebase to avoid dead code.

@reedhecre

reedhecre commented Jun 16, 2026

Copy link
Copy Markdown

Codex Review

该评论由 review 机器人自动更新。

  • PR: Align tmp contracts for tile ops #824 Align tmp contracts for tile ops
  • Author: FangRui0
  • Base/Head: main / refactor_tmp
  • Head SHA: 0f93917e1fa7
  • Trigger: PR 有新提交
  • Generated At: 2026-06-22T08:47:41Z
  • Previous Head SHA: adeae8fba5a4
  • Status: completed

Summary

PR #824 introduces a backend contract mismatch for A5 trowexpand* tmp forms, and one PTODSL reference example no longer matches the kernel it claims to reconstruct.

Findings

  1. P2 A5 `trowexpand*` tmp forms are now verifier-legal, but the VPTO backend still only provides 3-operand kernels lib/PTO/IR/PTO.cpp:10193

verifyTRowExpandReduceLikeOp now accepts A5 ops with a %tmp operand, so forms like pto.trowexpandmax/add/div/mul/sub/expdif/min pass verification. However the registered A5 TileLang kernels for this family still take only (src0, src1, dst) and have no 4-operand variant (for example lib/TileOps/trowexpandmax_template.py and lib/TileOps/trowexpandadd_template.py). That means the new IR is only accepted by --emit-pto-ir; compiling the same op with --pto-backend=vpto has no matching kernel to lower to and should fail.

  1. P3 `softmax_lowlevel.py` no longer reconstructs the referenced softmax kernel ptodsl/examples/softmax_lowlevel.py:61

This example still says it reconstructs test/tilelang_st/npu/a5/src/st/testcase/softmax/softmax.pto, but tile_col was changed to blayout=row_major. The referenced kernel allocates the corresponding oldmax/oldsum/newmax/newsum/expmax tiles as blayout=col_major (softmax.pto lines 116-133). The example therefore emits different tile types than the file it claims to mirror, so it is no longer a faithful low-level reference.

@FangRui0 FangRui0 force-pushed the refactor_tmp branch 4 times, most recently from b63bf80 to e9b88fa Compare June 18, 2026 06:47
Signed-off-by: FangRui <fangrui_95@163.com>
@FangRui0

Copy link
Copy Markdown
Contributor Author

/run a3

@reedhecre

Copy link
Copy Markdown

已接收 /run a3,A3 板测器会处理这条请求。

页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。

@reedhecre

Copy link
Copy Markdown

A3 板测失败

  • 触发方式:manual
  • 源码提交:cfe94f5ba586
  • 结果汇总:OK 0 / FAIL 0 / SKIP 0
  • 日志:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260622_162013_manual_pr824.log
  • 手动指令:/run a3
  • 触发人:FangRui0
  • 触发评论:Align tmp contracts for tile ops #824 (comment)
  • 失败阶段:sample-build-and-test / exit=1

日志尾部

al.py) SKIP requires --pto-arch=a5
Sync(test_intercore_sync_a5_ptoisa_vec.py) SKIP requires --pto-arch=a5
Sync(test_intercore_sync_a5.py) SKIP requires --pto-arch=a5
Sync(test_mem_inject_sync_basic.py) OK   generated: test_mem_inject_sync_basic-pto.cpp
Sync(test_set_wait_unified_api.py) OK   generated: test_set_wait_unified_api-pto.cpp
Sync(test_tmov_col_major_16x1_align_a5.pto) SKIP requires --pto-arch=a5
Sync(test_tmov_col_major_16x1_align_a5.py) SKIP requires --pto-arch=a5
Sync(test_tmov_row_major_1x16_control_a5.pto) SKIP requires --pto-arch=a5
Sync(test_tmov_row_major_1x16_control_a5.py) SKIP requires --pto-arch=a5
Sync(tmatmulk_autosync.py) OK   generated: tmatmulk_autosync-pto.cpp
TileSetGetValue(tile_getval_mat_invalid.py) XFAIL ptobc encode failed as expected
TileSetGetValue(tileSetGetValue.py) OK   generated: tileSetGetValue-pto.cpp
TInsert(tinsert_fp.py) OK   generated: tinsert_fp-pto.cpp
TInsert(tinsert.py) OK   generated: tinsert-pto.cpp
Tpows(tpows.py) OK   generated: tpows-pto.cpp
Tpow(tpow.py) OK   generated: tpow-pto.cpp
TPrefetchAsync(tprefetch_async_binding.py) OK   generated: tprefetch_async_binding-pto.cpp
TPrefetch(tprefetch.py) OK   generated: tprefetch-pto.cpp
Trans(trans.py) OK   generated: trans-pto.cpp
Trap(trap.py) OK   generated: trap-pto.cpp
TTri(ttri.py) OK   generated: ttri-pto.cpp
VectorAddition(vadd_pto_ir.py) OK   generated: vadd_pto_ir-pto.cpp
VectorAddition(vadd_validshape_hyper.py) OK   generated: vadd_validshape_hyper-pto.cpp
VectorAddition(vectorAddition.py) OK   generated: vectorAddition-pto.cpp
Xors(xors.py) OK   generated: xors-pto.cpp
Xor(xor.py)  OK   generated: xor-pto.cpp
-----------------------------
OK=235  FAIL=2  SKIP=39
=============================
===== END STAGE sample-build-and-test rc=1 @ 2026-06-22 16:30:03 =====

@FangRui0

Copy link
Copy Markdown
Contributor Author

/run a3

@zhangstevenunity zhangstevenunity left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Align tmp contracts for tile ops

The A5 "tmp is an unused placeholder" modeling (skip the scratch WRITE in getEffects to dodge false vec-overflow) is sound for the ops whose A5 kernel genuinely ignores tmp: tsel/tsels/txor/txors TileLang templates never write tmp, and the new A5 txors verifier checks are correctly A2/A3-gated. But two changes look like correctness regressions, and one needs confirmation.

Requesting changes

1 (blocking). A5 trowexpand* + tmp is now verifier-legal but cannot be lowered on EITHER A5 backend.
This PR deletes the hasTmp && A5 -> "expects A5 form to omit tmp" guard in verifyTRowExpandReduceLikeOp. That guard's own comment said it exists so a 0x0 dst cannot "let an A5 tmp form slip through and lower to the A2/A3 4-operand TROWEXPAND* call." Removing it re-enables exactly that:

  • --pto-backend=vpto: the A5 TileLang templates for this family are 3-param (src0, src1, dst) (lib/TileOps/trowexpandmax_template.py). ExpandTileOp builds one operand-spec per operand (buildSpecKey over all operands) and expand_helper rejects any descriptor where len(parameters) != len(operand_specs) (expand_helper.py:335), so a 4-operand op finds no kernel and ExpandTileOp hard-fails: "failed to instantiate tilelang template for pto.trowexpandmax" (ExpandTileOp.cpp:1204). This is the Codex bot's P2.
  • default EmitC: PTORowExpand*ToEmitC forwards tmp as a 4th arg unconditionally, emitting TROWEXPANDMAX(dst, src0, src1, tmp). The A5 board kernels are 3-operand (test/.../a5/.../trowexpandmax/main.cpp declares LaunchTROWEXPANDMAX_*(src0, src1, dst, stream); the A5 ST .pto uses ins(%src0, %src1)), matching the pre-PR "A5 supports the 3-operand form only" note this PR removed. So the 4-arg call has no A5 ISA overload.

The new tests do not catch this: issue708_* and tmp_contract_a5_non_same_shape use --emit-pto-ir (stops before lowering), and a5_unused_tmp_vec_overflow only FileChecks emitted text, never compiling for the board. Please either keep rejecting tmp on A5 for trowexpand*, or give the vpto templates a tmp param (like trowmax) AND confirm a 4-operand A5 EmitC/ISA path exists, backed by a real board test.

2 (blocking). The WithTmp relaxation silently loosens A2/A3 trowmin/trowprod, where tmp is real scratch.
verifyTRowReductionWithTmpCommon drops verifyTileBufSameElemType + verifyTileBufSameValidShape (and downgrades verifyVecTileCommon -> verifyVecTileStorage) for ALL arches. On master, trowmin/trowprod already used this helper (only trowmax/trowsum are newly migrated from the NoTmp helper). On A2/A3 these forward tmp to TROWMIN/TROWPROD(dst, src, tmp) as genuine scratch, and the deleted test trowprod_tmp_mismatch_invalid.py asserted that a mismatched tmp must fail. Now an undersized / wrong-dtype tmp passes verification on A2/A3 and reaches the intrinsic -> possible scratch overflow / type reinterpret. The PR's intent is A5-placeholder semantics, so please A5-gate the relaxation (keep the strict checks on A2/A3), or justify why A2/A3 reductions tolerate a non-matching tmp.

Please confirm

3. Row-reduction getEffects drop the A5 tmp WRITE with no justification comment. For trowmax/trowmin/trowsum/trowprod, EmitC still forwards tmp to TROWMAX(dst, src, tmp) on A5. Dropping the WRITE is correct only if the A5 ISA truly does not write tmp for these (the vpto template ignoring tmp is encouraging but does not cover the EmitC path). If any A5 reduction writes tmp as scratch, InsertSync now misses a WAR/WAW on it. tsel/txor got an explanatory comment; please add the same here and confirm against the A5 ISA.

Nits

  • ptodsl/examples/softmax_lowlevel.py still says it reconstructs softmax/softmax.pto, but flips tile_col to row_major while that kernel keeps blayout=col_major for the 8x1 reduction tiles (Codex P3).
  • PTO.cpp ~10313: the comment "(A5 tmp-form invariant is checked earlier ...)" is now stale -- the invariant it refers to was deleted in this PR.
  • verifyTRowReductionNoTmpCommon now has zero callers; it is marked [[maybe_unused]] rather than removed (gemini flagged this) -- please delete it.

Comment thread include/PTO/IR/PTOOps.td
let summary = "TROWEXPANDMAX: Row-wise broadcast max with per-row scalar vector.";
let description = [{
pto-isa has overloads with/without tmp on A2/A3; A5 supports the 3-operand form only.
pto-isa has overloads with/without tmp; on A5 the tmp operand is accepted as a placeholder and does not add extra shape constraints.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A5 trowexpand* with tmp is now verifier-legal but un-lowerable on both A5 backends.

vpto: the A5 TileLang template is 3-param (src0, src1, dst) (lib/TileOps/trowexpandmax_template.py). ExpandTileOp emits one operand-spec per operand and expand_helper drops any kernel where len(parameters) != len(operand_specs) (expand_helper.py:335), so a 4-operand op hard-fails ExpandTileOp: "failed to instantiate tilelang template for pto.trowexpandmax" (ExpandTileOp.cpp:1204).

EmitC (default): PTORowExpandMaxToEmitC forwards tmp as a 4th arg -> TROWEXPANDMAX(dst, src0, src1, tmp), but the A5 board kernel is 3-operand (a5/.../trowexpandmax/main.cpp: LaunchTROWEXPANDMAX_*(src0, src1, dst, stream)).

This is the bug the deleted hasTmp && A5 guard in verifyTRowExpandReduceLikeOp prevented -- its comment said the form would otherwise "lower to the A2/A3 4-operand TROWEXPAND* call." The new tests only use --emit-pto-ir or FileCheck text, so they never exercise the lowering. Suggest keeping the A5 reject, or adding a tmp-taking vpto template plus a real 4-operand A5 path with a board test.

Comment thread lib/PTO/IR/PTO.cpp
StringRef elemTypeError) {
if (failed(verifyRowReductionSrcLayout(op, srcTy, "src")) ||
failed(verifyVecTileCommon(op, tmpTy, "tmp")) ||
failed(verifyVecTileStorage(op, tmpTy, "tmp")) ||

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping verifyTileBufSameElemType + verifyTileBufSameValidShape here loosens tmp checks for ALL arches, but trowmin/trowprod already used this helper on master and forward tmp to TROWMIN/TROWPROD(dst, src, tmp) as real scratch on A2/A3. The deleted trowprod_tmp_mismatch_invalid.py asserted that a mismatched tmp must fail. Now an undersized / wrong-dtype tmp passes verification on A2/A3 and reaches the intrinsic (scratch overflow / type reinterpret). Please A5-gate this relaxation so A2/A3 keep the same-elem / same-valid-shape checks, or justify why A2/A3 reductions tolerate a non-matching tmp.

Comment thread lib/PTO/IR/PTO.cpp
SmallVectorImpl<SideEffects::EffectInstance<MemoryEffects::Effect>> &effects) {
PTO_ADD_READ(getSrcMutable());
PTO_ADD_WRITE(getTmpMutable());
if (getTargetArch(getOperation()) != PTOArch::A5)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This drops the A5 tmp WRITE for trowmax (same change for trowmin/trowsum/trowprod), but EmitC still forwards tmp to TROWMAX(dst, src, tmp) on A5. This is safe only if the A5 ISA does not write tmp for these reductions. If it does, InsertSync now misses a WAR/WAW on tmp. tsel/txor got an explanatory comment for the identical change; please add the same here and confirm against the A5 ISA (the vpto template ignoring tmp is supportive but does not cover the EmitC path).

_col_cfg = pto.TileBufConfigAttr.get(
pto.BLayoutAttr.get(pto.BLayout.ColMajor),
_reduce_cfg = pto.TileBufConfigAttr.get(
pto.BLayoutAttr.get(pto.BLayout.RowMajor),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flips tile_col to row_major, but softmax_lowlevel.py (line 13) still says it reconstructs test/tilelang_st/npu/a5/src/st/testcase/softmax/softmax.pto, where the 8x1 reduction tiles keep blayout=col_major. The example no longer matches the referenced kernel (Codex P3). Either revert this layout or update the reference/claim.

@reedhecre

Copy link
Copy Markdown

已接收 /run a3,A3 板测器会处理这条请求。

页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。

@zhangstevenunity zhangstevenunity left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction after reading the latest pto-isa (npu/a5 + npu/a2a3)

I checked the actual A5/A2A3 intrinsic implementations and need to retract two of my three blocking points -- the A5 placeholder design in this PR is more correct than my first pass implied. Apologies for the noise.

RETRACT my point 1, EmitC half (no board break). pto-isa include/pto/npu/a5/TRowExpand{Max,Add,Mul,Sub,Div,Min,Expdif}.hpp each provide a 4-arg TROWEXPAND*_IMPL(dst, src0, src1, tmp) overload, explicitly commented "4-arg overload for cross-architecture portability with A2/A3 ... the tmp tile is accepted and ignored." So the default EmitC path compiles and runs correctly on A5 -- there is no missing 4-operand ISA overload. My EmitC board-break claim was wrong.

RETRACT my point 3 (row-reduction A5 sync is fine). npu/a5/TRowReduce.hpp (TROWMAX/TROWSUM/TROWMIN_IMPL) and npu/a5/TRowProd.hpp take tmp and never touch it -- the reduction is pure register accumulation (vregdst). So dropping the A5 tmp WRITE in getEffects is correct and the false-vec-overflow fix is sound; no missing WAR/WAW.

DOWNGRADE my point 1, vpto half (now minor / non-blocking). Only the opt-in --pto-backend=vpto path is affected: the TileLang templates lib/TileOps/trowexpand*_template.py are still 3-param (src0, src1, dst), so ExpandTileOp hard-fails (loudly) on a 4-operand op (expand_helper.py:335 -> ExpandTileOp.cpp:1204). No current frontend emits trowexpand*+tmp, so this is latent. Suggest giving those templates a tmp param like trowmax_template.py already has (mirroring the ISA's 4-arg portability overload) -- a consistency follow-up, not a blocker.

STILL STANDS: my point 2 (A2/A3 trowmin/trowprod), now confirmed against the ISA. npu/a2a3/TRowMax.hpp and TRowProd.hpp use tmp as real read/write scratch: vector_dup(tmp, ...) then vmax/vmul(tmp, tmp, src + block*elemsPerBlock), with elemsPerBlock = BLOCK_BYTE_SIZE / sizeof(T) (8 for i32, 16 for i16), and the tile pointer is cast to T*. So A2/A3 needs tmp >= one block AND tmp dtype == src dtype. The relaxed verifyTRowReductionWithTmpCommon drops BOTH the same-elem-type check and any size check for all arches, so a sub-block / wrong-dtype tmp now passes verification on A2/A3 -> silent out-of-bounds scratch write. This is reachable in exactly the cross-arch "placeholder" scenario this PR promotes: a [1,8] tmp is fine for i32 (block=8) but overflows an i16 reduction (block=16). The old same-valid-shape check was over-strict (it demanded a full-src tmp, more than the one-block minimum) but it was safe.

Recommend A5-gating the relaxation -- keep same-elem-type + a >= one-block size check on A2/A3 -- instead of dropping the checks for all arches. If trowmin/trowprod callers are guaranteed to size tmp >= one block with matching dtype, point 2 is non-blocking too.

Net: the A5 placeholder modeling is sound and ISA-backed. The only residual correctness concern is the A2/A3 scratch checks in point 2.

@zhangstevenunity zhangstevenunity merged commit 74354bf into hw-native-sys:main Jun 23, 2026
10 checks passed
@reedhecre

Copy link
Copy Markdown

A3 板测失败

  • 触发方式:merged
  • 源码提交:74354bf93cad
  • 结果汇总:OK 152 / FAIL 69 / SKIP 1
  • 日志:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260623_135338_merged_pr824.log
  • 失败阶段:board-validation / exit=1

失败用例

  • orchestration_example_kernel_add (run, exit=2)
  • vector_example_dag_kernel_add_scalar (run, exit=2)
  • paged_attention_example_kernel_pv_matmul (run, exit=2)
  • paged_attention_example_kernel_init_inplace (run, exit=2)
  • vector_example_dag_kernel_add (run, exit=2)
  • paged_attention_example_kernel_online_update (run, exit=2)
  • paged_attention_example_kernel_softmax_prepare (run, exit=2)
  • orchestration_example_kernel_add_scalar (run, exit=2)
  • paged_attention_example_kernel_qk_matmul (run, exit=2)
  • orchestration_example_kernel_mul (run, exit=2)
  • vector_example_dag_kernel_mul (run, exit=2)
  • rowexpanddiv (run, exit=2)
  • prelu (run, exit=2)
  • plan_memory_bind_tile_alias_liveness (run, exit=2)
  • plan_memory_peak_exact_capacity (run, exit=2)
  • plan_memory_loop_no_reuse_outer_live (run, exit=2)
  • plan_memory_if_yield (run, exit=2)
  • plan_memory_loop_in_if (run, exit=2)
  • plan_memory_peak_8_overlapping (run, exit=2)
  • plan_memory_if_in_loop (run, exit=2)
  • plan_memory_fragmentation_hole_fit (run, exit=2)
  • plan_memory_for_iter_args_yield (run, exit=2)
  • plan_memory_no_reuse_overlap (run, exit=2)
  • plan_memory_reuse_sequential (run, exit=2)
  • plan_memory_nested_loops (run, exit=2)
  • plan_memory_fragmentation_two_holes (run, exit=2)
  • rems (run, exit=2)
  • xor (run, exit=2)
  • partition_view_verify_rank_mismatch_valid (run, exit=2)
  • partition_view_verify_valid (run, exit=2)
  • rowexpandmul (run, exit=2)
  • quant_asym (run, exit=2)
  • quant (run, exit=2)
  • partition5d_dynamic (run, exit=2)
  • partition5d (run, exit=2)
  • scatter (run, exit=2)
  • sparse_attn_test_incore_7 (run, exit=2)
  • decode_hca_test_incore_54 (run, exit=2)
  • attention_swa_test_incore_40 (run, exit=2)
  • decode_swa_test_incore_40 (run, exit=2)
  • decode_csa_test_incore_81 (run, exit=2)
  • attention_hca_test_incore_54 (run, exit=2)
  • attention_csa_test_refresh_incore_81 (run, exit=2)
  • tensor_view_layout_dn (run, exit=2)
  • rowexpandsub (run, exit=2)
  • rope_kv_cache (run, exit=2)
  • qwen3_decode_incore_4 (run, exit=2)
  • post_rmsnorm (run, exit=2)
  • qwen3_decode_incore_1 (run, exit=2)
  • qwen3_decode_incore_10 (run, exit=2)
  • qwen3_decode_incore_11 (run, exit=2)
  • rmsnorm (run, exit=2)
  • qwen3_decode_incore_6 (run, exit=2)
  • qwen3_decode_incore_2 (run, exit=2)
  • qwen3_decode_incore_7 (run, exit=2)
  • qwen3_decode_incore_5 (run, exit=2)
  • qwen3_decode_incore_12 (run, exit=2)
  • sels (run, exit=2)
  • tprefetch_async_binding (run, exit=1)
  • test_barrier_sync (run, exit=2)
  • matmul (run, exit=2)
  • add_double_dynamic (run, exit=2)
  • nested_loop_confliect (run, exit=2)
  • rar_optimization_test (run, exit=2)
  • test_dynamic_valid_shape (run, exit=2)
  • test_auto_sync_tail_hint (run, exit=2)
  • compensation_test (run, exit=2)
  • rem (run, exit=2)
  • partmin (run, exit=2)

@reedhecre

Copy link
Copy Markdown

A3 板测失败详情:PR #824

orchestration_example_kernel_add

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_add/orchestration_example_kernel_add_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_add/orchestration_example_kernel_add_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_add/orchestration_example_kernel_add_kernel.cpp:113:3: error: no matching function for call to '__ptoas_kernel_add_impl'
  __ptoas_kernel_add_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_add/orchestration_example_kernel_add_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/orchestration_example_kernel_add_kernel.dir/build.make:76: CMakeFiles/orchestration_example_kernel_add_kernel.dir/orchestration_example_kernel_add_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/orchestration_example_kernel_add_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:07:37] ERROR: testcase failed (exit 2): orchestration_example_kernel_add
vector_example_dag_kernel_add_scalar

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_add_scalar/vector_example_dag_kernel_add_scalar_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_scalar_impl(__gm__ float* v1, float v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_add_scalar/vector_example_dag_kernel_add_scalar_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_scalar_impl(__gm__ float* v1, float v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_add_scalar/vector_example_dag_kernel_add_scalar_kernel.cpp:105:3: error: no matching function for call to '__ptoas_kernel_add_scalar_impl'
  __ptoas_kernel_add_scalar_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_add_scalar/vector_example_dag_kernel_add_scalar_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_scalar_impl(__gm__ float* v1, float v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/vector_example_dag_kernel_add_scalar_kernel.dir/build.make:76: CMakeFiles/vector_example_dag_kernel_add_scalar_kernel.dir/vector_example_dag_kernel_add_scalar_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/vector_example_dag_kernel_add_scalar_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:07:38] ERROR: testcase failed (exit 2): vector_example_dag_kernel_add_scalar
paged_attention_example_kernel_pv_matmul

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_pv_matmul/paged_attention_example_kernel_pv_matmul_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_kernel_pv_matmul_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_pv_matmul/paged_attention_example_kernel_pv_matmul_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_kernel_pv_matmul_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_pv_matmul/paged_attention_example_kernel_pv_matmul_kernel.cpp:124:3: error: no matching function for call to '__ptoas_kernel_pv_matmul_impl'
  __ptoas_kernel_pv_matmul_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_pv_matmul/paged_attention_example_kernel_pv_matmul_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_kernel_pv_matmul_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/paged_attention_example_kernel_pv_matmul_kernel.dir/build.make:76: CMakeFiles/paged_attention_example_kernel_pv_matmul_kernel.dir/paged_attention_example_kernel_pv_matmul_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/paged_attention_example_kernel_pv_matmul_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:07:40] ERROR: testcase failed (exit 2): paged_attention_example_kernel_pv_matmul
paged_attention_example_kernel_init_inplace

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_init_inplace/paged_attention_example_kernel_init_inplace_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_kernel_init_inplace_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_init_inplace/paged_attention_example_kernel_init_inplace_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_kernel_init_inplace_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_init_inplace/paged_attention_example_kernel_init_inplace_kernel.cpp:81:3: error: no matching function for call to '__ptoas_kernel_init_inplace_impl'
  __ptoas_kernel_init_inplace_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_init_inplace/paged_attention_example_kernel_init_inplace_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_kernel_init_inplace_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/paged_attention_example_kernel_init_inplace_kernel.dir/build.make:76: CMakeFiles/paged_attention_example_kernel_init_inplace_kernel.dir/paged_attention_example_kernel_init_inplace_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/paged_attention_example_kernel_init_inplace_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:07:41] ERROR: testcase failed (exit 2): paged_attention_example_kernel_init_inplace
vector_example_dag_kernel_add

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_add/vector_example_dag_kernel_add_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_add/vector_example_dag_kernel_add_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_add/vector_example_dag_kernel_add_kernel.cpp:113:3: error: no matching function for call to '__ptoas_kernel_add_impl'
  __ptoas_kernel_add_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_add/vector_example_dag_kernel_add_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/vector_example_dag_kernel_add_kernel.dir/build.make:76: CMakeFiles/vector_example_dag_kernel_add_kernel.dir/vector_example_dag_kernel_add_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/vector_example_dag_kernel_add_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:07:42] ERROR: testcase failed (exit 2): vector_example_dag_kernel_add
paged_attention_example_kernel_online_update

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_online_update/paged_attention_example_kernel_online_update_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_kernel_online_update_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, __gm__ float* v4, __gm__ float* v5, __gm__ float* v6, __gm__ float* v7, bool v8, bool v9) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_online_update/paged_attention_example_kernel_online_update_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_kernel_online_update_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, __gm__ float* v4, __gm__ float* v5, __gm__ float* v6, __gm__ float* v7, bool v8, bool v9) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_online_update/paged_attention_example_kernel_online_update_kernel.cpp:251:3: error: no matching function for call to '__ptoas_kernel_online_update_impl'
  __ptoas_kernel_online_update_impl(v1, v2, v3, v4, v5, v6, v7, v8, v9);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_online_update/paged_attention_example_kernel_online_update_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_kernel_online_update_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, __gm__ float* v4, __gm__ float* v5, __gm__ float* v6, __gm__ float* v7, bool v8, bool v9) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/paged_attention_example_kernel_online_update_kernel.dir/build.make:76: CMakeFiles/paged_attention_example_kernel_online_update_kernel.dir/paged_attention_example_kernel_online_update_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/paged_attention_example_kernel_online_update_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:07:44] ERROR: testcase failed (exit 2): paged_attention_example_kernel_online_update
paged_attention_example_kernel_softmax_prepare

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_softmax_prepare/paged_attention_example_kernel_softmax_prepare_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_kernel_softmax_prepare_impl(__gm__ float* v1, float v2, __gm__ bfloat16_t* v3, __gm__ float* v4, __gm__ float* v5) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_softmax_prepare/paged_attention_example_kernel_softmax_prepare_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_kernel_softmax_prepare_impl(__gm__ float* v1, float v2, __gm__ bfloat16_t* v3, __gm__ float* v4, __gm__ float* v5) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_softmax_prepare/paged_attention_example_kernel_softmax_prepare_kernel.cpp:146:3: error: no matching function for call to '__ptoas_kernel_softmax_prepare_impl'
  __ptoas_kernel_softmax_prepare_impl(v1, v2, v3, v4, v5);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_softmax_prepare/paged_attention_example_kernel_softmax_prepare_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_kernel_softmax_prepare_impl(__gm__ float* v1, float v2, __gm__ bfloat16_t* v3, __gm__ float* v4, __gm__ float* v5) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/paged_attention_example_kernel_softmax_prepare_kernel.dir/build.make:76: CMakeFiles/paged_attention_example_kernel_softmax_prepare_kernel.dir/paged_attention_example_kernel_softmax_prepare_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/paged_attention_example_kernel_softmax_prepare_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:07:45] ERROR: testcase failed (exit 2): paged_attention_example_kernel_softmax_prepare
orchestration_example_kernel_add_scalar

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_add_scalar/orchestration_example_kernel_add_scalar_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_scalar_impl(__gm__ float* v1, float v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_add_scalar/orchestration_example_kernel_add_scalar_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_scalar_impl(__gm__ float* v1, float v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_add_scalar/orchestration_example_kernel_add_scalar_kernel.cpp:105:3: error: no matching function for call to '__ptoas_kernel_add_scalar_impl'
  __ptoas_kernel_add_scalar_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_add_scalar/orchestration_example_kernel_add_scalar_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_kernel_add_scalar_impl(__gm__ float* v1, float v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/orchestration_example_kernel_add_scalar_kernel.dir/build.make:76: CMakeFiles/orchestration_example_kernel_add_scalar_kernel.dir/orchestration_example_kernel_add_scalar_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/orchestration_example_kernel_add_scalar_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:07:47] ERROR: testcase failed (exit 2): orchestration_example_kernel_add_scalar
paged_attention_example_kernel_qk_matmul

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_qk_matmul/paged_attention_example_kernel_qk_matmul_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_kernel_qk_matmul_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_qk_matmul/paged_attention_example_kernel_qk_matmul_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_kernel_qk_matmul_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_qk_matmul/paged_attention_example_kernel_qk_matmul_kernel.cpp:124:3: error: no matching function for call to '__ptoas_kernel_qk_matmul_impl'
  __ptoas_kernel_qk_matmul_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/paged_attention_example_kernel_qk_matmul/paged_attention_example_kernel_qk_matmul_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_kernel_qk_matmul_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/paged_attention_example_kernel_qk_matmul_kernel.dir/build.make:76: CMakeFiles/paged_attention_example_kernel_qk_matmul_kernel.dir/paged_attention_example_kernel_qk_matmul_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/paged_attention_example_kernel_qk_matmul_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:07:48] ERROR: testcase failed (exit 2): paged_attention_example_kernel_qk_matmul
orchestration_example_kernel_mul

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_mul/orchestration_example_kernel_mul_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_kernel_mul_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_mul/orchestration_example_kernel_mul_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_kernel_mul_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_mul/orchestration_example_kernel_mul_kernel.cpp:113:3: error: no matching function for call to '__ptoas_kernel_mul_impl'
  __ptoas_kernel_mul_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/orchestration_example_kernel_mul/orchestration_example_kernel_mul_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_kernel_mul_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/orchestration_example_kernel_mul_kernel.dir/build.make:76: CMakeFiles/orchestration_example_kernel_mul_kernel.dir/orchestration_example_kernel_mul_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/orchestration_example_kernel_mul_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:07:49] ERROR: testcase failed (exit 2): orchestration_example_kernel_mul
vector_example_dag_kernel_mul

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_mul/vector_example_dag_kernel_mul_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_kernel_mul_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_mul/vector_example_dag_kernel_mul_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_kernel_mul_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_mul/vector_example_dag_kernel_mul_kernel.cpp:113:3: error: no matching function for call to '__ptoas_kernel_mul_impl'
  __ptoas_kernel_mul_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PyPTOIRParser/vector_example_dag_kernel_mul/vector_example_dag_kernel_mul_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_kernel_mul_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/vector_example_dag_kernel_mul_kernel.dir/build.make:76: CMakeFiles/vector_example_dag_kernel_mul_kernel.dir/vector_example_dag_kernel_mul_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/vector_example_dag_kernel_mul_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:07:51] ERROR: testcase failed (exit 2): vector_example_dag_kernel_mul
rowexpanddiv

stage=run info=exit=2

[ERROR] Mismatch: golden_v3.bin vs v3.bin, max diff=18.47865390777588 at idx=963 (golden=10.328862190246582, out=-8.149791717529297, dtype=float32)
[ERROR] compare failed
[2026-06-23 14:12:07] ERROR: testcase failed (exit 2): rowexpanddiv
prelu

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Prelu/prelu/prelu_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_prelu_kernel_2d_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Prelu/prelu/prelu_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_prelu_kernel_2d_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Prelu/prelu/prelu_kernel.cpp:123:3: error: no matching function for call to '__ptoas_prelu_kernel_2d_impl'
  __ptoas_prelu_kernel_2d_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Prelu/prelu/prelu_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_prelu_kernel_2d_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/prelu_kernel.dir/build.make:76: CMakeFiles/prelu_kernel.dir/prelu_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/prelu_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:12:14] ERROR: testcase failed (exit 2): prelu
plan_memory_bind_tile_alias_liveness

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_bind_tile_alias_liveness/plan_memory_bind_tile_alias_liveness_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_bind_tile_alias_liveness_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_bind_tile_alias_liveness/plan_memory_bind_tile_alias_liveness_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_bind_tile_alias_liveness_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_bind_tile_alias_liveness/plan_memory_bind_tile_alias_liveness_kernel.cpp:127:3: error: no matching function for call to '__ptoas_bind_tile_alias_liveness_impl'
  __ptoas_bind_tile_alias_liveness_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_bind_tile_alias_liveness/plan_memory_bind_tile_alias_liveness_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_bind_tile_alias_liveness_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_bind_tile_alias_liveness_kernel.dir/build.make:76: CMakeFiles/plan_memory_bind_tile_alias_liveness_kernel.dir/plan_memory_bind_tile_alias_liveness_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_bind_tile_alias_liveness_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:21] ERROR: testcase failed (exit 2): plan_memory_bind_tile_alias_liveness
plan_memory_peak_exact_capacity

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_peak_exact_capacity/plan_memory_peak_exact_capacity_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_peak_exact_capacity_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_peak_exact_capacity/plan_memory_peak_exact_capacity_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_peak_exact_capacity_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_peak_exact_capacity/plan_memory_peak_exact_capacity_kernel.cpp:617:3: error: no matching function for call to '__ptoas_peak_exact_capacity_impl'
  __ptoas_peak_exact_capacity_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_peak_exact_capacity/plan_memory_peak_exact_capacity_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_peak_exact_capacity_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_peak_exact_capacity_kernel.dir/build.make:76: CMakeFiles/plan_memory_peak_exact_capacity_kernel.dir/plan_memory_peak_exact_capacity_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_peak_exact_capacity_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:22] ERROR: testcase failed (exit 2): plan_memory_peak_exact_capacity
plan_memory_loop_no_reuse_outer_live

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_loop_no_reuse_outer_live/plan_memory_loop_no_reuse_outer_live_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_loop_outer_live_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_loop_no_reuse_outer_live/plan_memory_loop_no_reuse_outer_live_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_loop_outer_live_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_loop_no_reuse_outer_live/plan_memory_loop_no_reuse_outer_live_kernel.cpp:135:3: error: no matching function for call to '__ptoas_loop_outer_live_impl'
  __ptoas_loop_outer_live_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_loop_no_reuse_outer_live/plan_memory_loop_no_reuse_outer_live_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_loop_outer_live_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_loop_no_reuse_outer_live_kernel.dir/build.make:76: CMakeFiles/plan_memory_loop_no_reuse_outer_live_kernel.dir/plan_memory_loop_no_reuse_outer_live_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_loop_no_reuse_outer_live_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:23] ERROR: testcase failed (exit 2): plan_memory_loop_no_reuse_outer_live
plan_memory_if_yield

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_if_yield/plan_memory_if_yield_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_if_yield_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_if_yield/plan_memory_if_yield_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_if_yield_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_if_yield/plan_memory_if_yield_kernel.cpp:130:3: error: no matching function for call to '__ptoas_if_yield_impl'
  __ptoas_if_yield_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_if_yield/plan_memory_if_yield_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_if_yield_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_if_yield_kernel.dir/build.make:76: CMakeFiles/plan_memory_if_yield_kernel.dir/plan_memory_if_yield_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_if_yield_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:25] ERROR: testcase failed (exit 2): plan_memory_if_yield
plan_memory_loop_in_if

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_loop_in_if/plan_memory_loop_in_if_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_loop_in_if_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_loop_in_if/plan_memory_loop_in_if_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_loop_in_if_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_loop_in_if/plan_memory_loop_in_if_kernel.cpp:138:3: error: no matching function for call to '__ptoas_loop_in_if_impl'
  __ptoas_loop_in_if_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_loop_in_if/plan_memory_loop_in_if_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_loop_in_if_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_loop_in_if_kernel.dir/build.make:76: CMakeFiles/plan_memory_loop_in_if_kernel.dir/plan_memory_loop_in_if_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_loop_in_if_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:26] ERROR: testcase failed (exit 2): plan_memory_loop_in_if
plan_memory_peak_8_overlapping

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_peak_8_overlapping/plan_memory_peak_8_overlapping_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_peak_8_overlapping_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_peak_8_overlapping/plan_memory_peak_8_overlapping_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_peak_8_overlapping_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_peak_8_overlapping/plan_memory_peak_8_overlapping_kernel.cpp:265:3: error: no matching function for call to '__ptoas_peak_8_overlapping_impl'
  __ptoas_peak_8_overlapping_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_peak_8_overlapping/plan_memory_peak_8_overlapping_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_peak_8_overlapping_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_peak_8_overlapping_kernel.dir/build.make:76: CMakeFiles/plan_memory_peak_8_overlapping_kernel.dir/plan_memory_peak_8_overlapping_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_peak_8_overlapping_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:27] ERROR: testcase failed (exit 2): plan_memory_peak_8_overlapping
plan_memory_if_in_loop

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_if_in_loop/plan_memory_if_in_loop_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_if_in_loop_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_if_in_loop/plan_memory_if_in_loop_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_if_in_loop_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_if_in_loop/plan_memory_if_in_loop_kernel.cpp:143:3: error: no matching function for call to '__ptoas_if_in_loop_impl'
  __ptoas_if_in_loop_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_if_in_loop/plan_memory_if_in_loop_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_if_in_loop_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_if_in_loop_kernel.dir/build.make:76: CMakeFiles/plan_memory_if_in_loop_kernel.dir/plan_memory_if_in_loop_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_if_in_loop_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:29] ERROR: testcase failed (exit 2): plan_memory_if_in_loop
plan_memory_fragmentation_hole_fit

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_fragmentation_hole_fit/plan_memory_fragmentation_hole_fit_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_fragmentation_hole_fit_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_fragmentation_hole_fit/plan_memory_fragmentation_hole_fit_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_fragmentation_hole_fit_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_fragmentation_hole_fit/plan_memory_fragmentation_hole_fit_kernel.cpp:610:3: error: no matching function for call to '__ptoas_fragmentation_hole_fit_impl'
  __ptoas_fragmentation_hole_fit_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_fragmentation_hole_fit/plan_memory_fragmentation_hole_fit_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_fragmentation_hole_fit_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_fragmentation_hole_fit_kernel.dir/build.make:76: CMakeFiles/plan_memory_fragmentation_hole_fit_kernel.dir/plan_memory_fragmentation_hole_fit_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_fragmentation_hole_fit_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:30] ERROR: testcase failed (exit 2): plan_memory_fragmentation_hole_fit
plan_memory_for_iter_args_yield

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_for_iter_args_yield/plan_memory_for_iter_args_yield_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_for_iter_args_yield_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_for_iter_args_yield/plan_memory_for_iter_args_yield_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_for_iter_args_yield_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_for_iter_args_yield/plan_memory_for_iter_args_yield_kernel.cpp:140:3: error: no matching function for call to '__ptoas_for_iter_args_yield_impl'
  __ptoas_for_iter_args_yield_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_for_iter_args_yield/plan_memory_for_iter_args_yield_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_for_iter_args_yield_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_for_iter_args_yield_kernel.dir/build.make:76: CMakeFiles/plan_memory_for_iter_args_yield_kernel.dir/plan_memory_for_iter_args_yield_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_for_iter_args_yield_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:31] ERROR: testcase failed (exit 2): plan_memory_for_iter_args_yield
plan_memory_no_reuse_overlap

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_no_reuse_overlap/plan_memory_no_reuse_overlap_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_no_reuse_overlap_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_no_reuse_overlap/plan_memory_no_reuse_overlap_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_no_reuse_overlap_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_no_reuse_overlap/plan_memory_no_reuse_overlap_kernel.cpp:127:3: error: no matching function for call to '__ptoas_no_reuse_overlap_impl'
  __ptoas_no_reuse_overlap_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_no_reuse_overlap/plan_memory_no_reuse_overlap_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_no_reuse_overlap_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_no_reuse_overlap_kernel.dir/build.make:76: CMakeFiles/plan_memory_no_reuse_overlap_kernel.dir/plan_memory_no_reuse_overlap_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_no_reuse_overlap_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:33] ERROR: testcase failed (exit 2): plan_memory_no_reuse_overlap
plan_memory_reuse_sequential

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_reuse_sequential/plan_memory_reuse_sequential_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_reuse_sequential_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_reuse_sequential/plan_memory_reuse_sequential_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_reuse_sequential_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_reuse_sequential/plan_memory_reuse_sequential_kernel.cpp:771:3: error: no matching function for call to '__ptoas_reuse_sequential_impl'
  __ptoas_reuse_sequential_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_reuse_sequential/plan_memory_reuse_sequential_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_reuse_sequential_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_reuse_sequential_kernel.dir/build.make:76: CMakeFiles/plan_memory_reuse_sequential_kernel.dir/plan_memory_reuse_sequential_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_reuse_sequential_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:34] ERROR: testcase failed (exit 2): plan_memory_reuse_sequential
plan_memory_nested_loops

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_nested_loops/plan_memory_nested_loops_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_nested_loops_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_nested_loops/plan_memory_nested_loops_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_nested_loops_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_nested_loops/plan_memory_nested_loops_kernel.cpp:167:3: error: no matching function for call to '__ptoas_nested_loops_impl'
  __ptoas_nested_loops_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_nested_loops/plan_memory_nested_loops_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_nested_loops_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_nested_loops_kernel.dir/build.make:76: CMakeFiles/plan_memory_nested_loops_kernel.dir/plan_memory_nested_loops_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_nested_loops_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:35] ERROR: testcase failed (exit 2): plan_memory_nested_loops

@reedhecre

Copy link
Copy Markdown

A3 板测失败详情:PR #824

plan_memory_fragmentation_two_holes

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_fragmentation_two_holes/plan_memory_fragmentation_two_holes_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_fragmentation_two_holes_impl(__gm__ half* v1, __gm__ half* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_fragmentation_two_holes/plan_memory_fragmentation_two_holes_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_fragmentation_two_holes_impl(__gm__ half* v1, __gm__ half* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_fragmentation_two_holes/plan_memory_fragmentation_two_holes_kernel.cpp:637:3: error: no matching function for call to '__ptoas_fragmentation_two_holes_impl'
  __ptoas_fragmentation_two_holes_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/planmemory/plan_memory_fragmentation_two_holes/plan_memory_fragmentation_two_holes_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_fragmentation_two_holes_impl(__gm__ half* v1, __gm__ half* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/plan_memory_fragmentation_two_holes_kernel.dir/build.make:76: CMakeFiles/plan_memory_fragmentation_two_holes_kernel.dir/plan_memory_fragmentation_two_holes_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/plan_memory_fragmentation_two_holes_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:37] ERROR: testcase failed (exit 2): plan_memory_fragmentation_two_holes
rems

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Rems/rems/rems_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_rems_kernel_2d_impl(__gm__ float* v1, __gm__ float* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Rems/rems/rems_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_rems_kernel_2d_impl(__gm__ float* v1, __gm__ float* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Rems/rems/rems_kernel.cpp:116:3: error: no matching function for call to '__ptoas_rems_kernel_2d_impl'
  __ptoas_rems_kernel_2d_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Rems/rems/rems_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_rems_kernel_2d_impl(__gm__ float* v1, __gm__ float* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/rems_kernel.dir/build.make:76: CMakeFiles/rems_kernel.dir/rems_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/rems_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:13:38] ERROR: testcase failed (exit 2): rems
xor

stage=run info=exit=2

[ERROR] Mismatch: golden_v2.bin vs v2.bin, max diff=255.0 at idx=312 (golden=0, out=-255, dtype=int16)
[ERROR] compare failed
[2026-06-23 14:14:23] ERROR: testcase failed (exit 2): xor
partition_view_verify_rank_mismatch_valid

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PartitionView/partition_view_verify_rank_mismatch_valid/partition_view_verify_rank_mismatch_valid_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_partition_view_verify_rank_mismatch_valid_impl(__gm__ float* v1) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PartitionView/partition_view_verify_rank_mismatch_valid/partition_view_verify_rank_mismatch_valid_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_partition_view_verify_rank_mismatch_valid_impl(__gm__ float* v1) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PartitionView/partition_view_verify_rank_mismatch_valid/partition_view_verify_rank_mismatch_valid_kernel.cpp:81:3: error: no matching function for call to '__ptoas_partition_view_verify_rank_mismatch_valid_impl'
  __ptoas_partition_view_verify_rank_mismatch_valid_impl(v1);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PartitionView/partition_view_verify_rank_mismatch_valid/partition_view_verify_rank_mismatch_valid_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_partition_view_verify_rank_mismatch_valid_impl(__gm__ float* v1) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/partition_view_verify_rank_mismatch_valid_kernel.dir/build.make:76: CMakeFiles/partition_view_verify_rank_mismatch_valid_kernel.dir/partition_view_verify_rank_mismatch_valid_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/partition_view_verify_rank_mismatch_valid_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:16:26] ERROR: testcase failed (exit 2): partition_view_verify_rank_mismatch_valid
partition_view_verify_valid

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PartitionView/partition_view_verify_valid/partition_view_verify_valid_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_partition_view_verify_valid_impl(__gm__ float* v1) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PartitionView/partition_view_verify_valid/partition_view_verify_valid_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_partition_view_verify_valid_impl(__gm__ float* v1) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PartitionView/partition_view_verify_valid/partition_view_verify_valid_kernel.cpp:81:3: error: no matching function for call to '__ptoas_partition_view_verify_valid_impl'
  __ptoas_partition_view_verify_valid_impl(v1);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/PartitionView/partition_view_verify_valid/partition_view_verify_valid_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_partition_view_verify_valid_impl(__gm__ float* v1) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/partition_view_verify_valid_kernel.dir/build.make:76: CMakeFiles/partition_view_verify_valid_kernel.dir/partition_view_verify_valid_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/partition_view_verify_valid_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:16:27] ERROR: testcase failed (exit 2): partition_view_verify_valid
rowexpandmul

stage=run info=exit=2

[ERROR] Mismatch: golden_v3.bin vs v3.bin, max diff=14.702873229980469 at idx=206 (golden=8.388465881347656, out=-6.3144073486328125, dtype=float32)
[ERROR] compare failed
[2026-06-23 14:17:55] ERROR: testcase failed (exit 2): rowexpandmul
quant_asym

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TQuant.hpp:53:58: error: member reference type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' is a pointer; did you mean to use '->'?
    TASSIGN_IMPL(src_s32, reinterpret_cast<uintptr_t>(tmp.data()));
                                                      ~~~^
                                                         ->
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:2290:5: note: in instantiation of function template specialization 'pto::TQUANT_IMPL<pto::QuantType::INT8_ASYM, pto::Tile<pto::TileType::Vec, unsigned char, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *>' requested here
    TQUANT_IMPL<quant_type, TileDataOut, TileDataSrc, TileDataPara>(dst, src, scale, offset);
    ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant_asym/quant_asym_kernel.cpp:117:3: note: in instantiation of function template specialization 'pto::TQUANT<pto::QuantType::INT8_ASYM, pto::Tile<pto::TileType::Vec, unsigned char, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>>' requested here
  TQUANT<pto::QuantType::INT8_ASYM, Tile<TileType::Vec, uint8_t, 32, 32, BLayout::RowMajor, 32, 32, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>, Tile<TileType::Vec, float, 32, 32, BLayout::RowMajor, 32, 32, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>, Tile<TileType::Vec, float, 32, 1, BLayout::ColMajor, 32, 1, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>>(v29, v23, v25, v31);
  ^
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant_asym/quant_asym_kernel.cpp:32:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/pto-inst.hpp:30:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:18:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr_impl.hpp:141:
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandMul.hpp:98:54: error: member reference type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' is a pointer; did you mean to use '->'?
            dst.data(), src0.data(), src1.data(), tmp.data(), validRow, validCol);
                                                  ~~~^
                                                     ->
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TQuant.hpp:42:5: note: in instantiation of function template specialization 'pto::TROWEXPANDMUL_IMPL<pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *>' requested here
    TROWEXPANDMUL_IMPL(src, src, scale, tmp);
    ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:2290:5: note: in instantiation of function template specialization 'pto::TQUANT_IMPL<pto::QuantType::INT8_ASYM, pto::Tile<pto::TileType::Vec, unsigned char, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *>' requested here
    TQUANT_IMPL<quant_type, TileDataOut, TileDataSrc, TileDataPara>(dst, src, scale, offset);
    ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant_asym/quant_asym_kernel.cpp:117:3: note: in instantiation of function template specialization 'pto::TQUANT<pto::QuantType::INT8_ASYM, pto::Tile<pto::TileType::Vec, unsigned char, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>>' requested here
  TQUANT<pto::QuantType::INT8_ASYM, Tile<TileType::Vec, uint8_t, 32, 32, BLayout::RowMajor, 32, 32, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>, Tile<TileType::Vec, float, 32, 32, BLayout::RowMajor, 32, 32, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>, Tile<TileType::Vec, float, 32, 1, BLayout::ColMajor, 32, 1, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>>(v29, v23, v25, v31);
  ^
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant_asym/quant_asym_kernel.cpp:32:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/pto-inst.hpp:30:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:18:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr_impl.hpp:141:
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandMul.hpp:97:9: error: no matching function for call to 'TRowExpandBin'
        TRowExpandBin<RowExpandMulOp<T>, TileDataDst, TileDataSrc0, TileDataSrc1, TileDataTmp>(
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:211:26: note: candidate template ignored: substitution failure [with Op = pto::RowExpandMulOp<float>, TileDataDst = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc0 = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc1 = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataTmp = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *]: type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' cannot be used prior to '::' because it has no members
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:188:26: note: candidate function template not viable: requires 5 arguments, but 6 were provided
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant_asym/quant_asym_kernel.cpp:32:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/pto-inst.hpp:30:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:18:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr_impl.hpp:141:
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandMul.hpp:103:54: error: member reference type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' is a pointer; did you mean to use '->'?
            dst.data(), src1.data(), src0.data(), tmp.data(), validRow, validCol);
                                                  ~~~^
                                                     ->
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandMul.hpp:102:9: error: no matching function for call to 'TRowExpandBin'
        TRowExpandBin<RowExpandMulOp<T>, TileDataDst, TileDataSrc1, TileDataSrc0, TileDataTmp>(
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:211:26: note: candidate template ignored: substitution failure [with Op = pto::RowExpandMulOp<float>, TileDataDst = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc0 = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc1 = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataTmp = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *]: type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' cannot be used prior to '::' because it has no members
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:188:26: note: candidate function template not viable: requires 5 arguments, but 6 were provided
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant_asym/quant_asym_kernel.cpp:32:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/pto-inst.hpp:30:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:18:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr_impl.hpp:137:
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandAdd.hpp:98:54: error: member reference type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' is a pointer; did you mean to use '->'?
            dst.data(), src0.data(), src1.data(), tmp.data(), validRow, validCol);
                                                  ~~~^
                                                     ->
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TQuant.hpp:45:9: note: in instantiation of function template specialization 'pto::TROWEXPANDADD_IMPL<pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *>' requested here
        TROWEXPANDADD_IMPL(src, src, *offset, tmp);
        ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:2290:5: note: in instantiation of function template specialization 'pto::TQUANT_IMPL<pto::QuantType::INT8_ASYM, pto::Tile<pto::TileType::Vec, unsigned char, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *>' requested here
    TQUANT_IMPL<quant_type, TileDataOut, TileDataSrc, TileDataPara>(dst, src, scale, offset);
    ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant_asym/quant_asym_kernel.cpp:117:3: note: in instantiation of function template specialization 'pto::TQUANT<pto::QuantType::INT8_ASYM, pto::Tile<pto::TileType::Vec, unsigned char, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>>' requested here
  TQUANT<pto::QuantType::INT8_ASYM, Tile<TileType::Vec, uint8_t, 32, 32, BLayout::RowMajor, 32, 32, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>, Tile<TileType::Vec, float, 32, 32, BLayout::RowMajor, 32, 32, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>, Tile<TileType::Vec, float, 32, 1, BLayout::ColMajor, 32, 1, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>>(v29, v23, v25, v31);
  ^
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant_asym/quant_asym_kernel.cpp:32:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/pto-inst.hpp:30:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:18:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr_impl.hpp:137:
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandAdd.hpp:97:9: error: no matching function for call to 'TRowExpandBin'
        TRowExpandBin<RowExpandAddOp<T>, TileDataDst, TileDataSrc0, TileDataSrc1, TileDataTmp>(
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:211:26: note: candidate template ignored: substitution failure [with Op = pto::RowExpandAddOp<float>, TileDataDst = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc0 = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc1 = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataTmp = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *]: type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' cannot be used prior to '::' because it has no members
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:188:26: note: candidate function template not viable: requires 5 arguments, but 6 were provided
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant_asym/quant_asym_kernel.cpp:32:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/pto-inst.hpp:30:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:18:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr_impl.hpp:137:
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandAdd.hpp:103:54: error: member reference type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' is a pointer; did you mean to use '->'?
            dst.data(), src1.data(), src0.data(), tmp.data(), validRow, validCol);
                                                  ~~~^
                                                     ->
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandAdd.hpp:102:9: error: no matching function for call to 'TRowExpandBin'
        TRowExpandBin<RowExpandAddOp<T>, TileDataDst, TileDataSrc1, TileDataSrc0, TileDataTmp>(
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:211:26: note: candidate template ignored: substitution failure [with Op = pto::RowExpandAddOp<float>, TileDataDst = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc0 = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc1 = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataTmp = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *]: type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' cannot be used prior to '::' because it has no members
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:188:26: note: candidate function template not viable: requires 5 arguments, but 6 were provided
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
9 errors generated.
gmake[2]: *** [CMakeFiles/quant_asym_kernel.dir/build.make:76: CMakeFiles/quant_asym_kernel.dir/quant_asym_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/quant_asym_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:18:05] ERROR: testcase failed (exit 2): quant_asym
quant

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TQuant.hpp:53:58: error: member reference type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' is a pointer; did you mean to use '->'?
    TASSIGN_IMPL(src_s32, reinterpret_cast<uintptr_t>(tmp.data()));
                                                      ~~~^
                                                         ->
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:2290:5: note: in instantiation of function template specialization 'pto::TQUANT_IMPL<pto::QuantType::INT8_SYM, pto::Tile<pto::TileType::Vec, signed char, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *>' requested here
    TQUANT_IMPL<quant_type, TileDataOut, TileDataSrc, TileDataPara>(dst, src, scale, offset);
    ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant/quant_kernel.cpp:108:3: note: in instantiation of function template specialization 'pto::TQUANT<pto::QuantType::INT8_SYM, pto::Tile<pto::TileType::Vec, signed char, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>>' requested here
  TQUANT<pto::QuantType::INT8_SYM, Tile<TileType::Vec, int8_t, 32, 32, BLayout::RowMajor, 32, 32, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>, Tile<TileType::Vec, float, 32, 32, BLayout::RowMajor, 32, 32, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>, Tile<TileType::Vec, float, 32, 1, BLayout::ColMajor, 32, 1, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>>(v22, v18, v20);
  ^
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant/quant_kernel.cpp:32:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/pto-inst.hpp:30:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:18:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr_impl.hpp:141:
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandMul.hpp:98:54: error: member reference type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' is a pointer; did you mean to use '->'?
            dst.data(), src0.data(), src1.data(), tmp.data(), validRow, validCol);
                                                  ~~~^
                                                     ->
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TQuant.hpp:42:5: note: in instantiation of function template specialization 'pto::TROWEXPANDMUL_IMPL<pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *>' requested here
    TROWEXPANDMUL_IMPL(src, src, scale, tmp);
    ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:2290:5: note: in instantiation of function template specialization 'pto::TQUANT_IMPL<pto::QuantType::INT8_SYM, pto::Tile<pto::TileType::Vec, signed char, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *>' requested here
    TQUANT_IMPL<quant_type, TileDataOut, TileDataSrc, TileDataPara>(dst, src, scale, offset);
    ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant/quant_kernel.cpp:108:3: note: in instantiation of function template specialization 'pto::TQUANT<pto::QuantType::INT8_SYM, pto::Tile<pto::TileType::Vec, signed char, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>>' requested here
  TQUANT<pto::QuantType::INT8_SYM, Tile<TileType::Vec, int8_t, 32, 32, BLayout::RowMajor, 32, 32, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>, Tile<TileType::Vec, float, 32, 32, BLayout::RowMajor, 32, 32, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>, Tile<TileType::Vec, float, 32, 1, BLayout::ColMajor, 32, 1, SLayout::NoneBox, 512, PadValue::Null, CompactMode::Null>>(v22, v18, v20);
  ^
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant/quant_kernel.cpp:32:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/pto-inst.hpp:30:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:18:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr_impl.hpp:141:
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandMul.hpp:97:9: error: no matching function for call to 'TRowExpandBin'
        TRowExpandBin<RowExpandMulOp<T>, TileDataDst, TileDataSrc0, TileDataSrc1, TileDataTmp>(
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:211:26: note: candidate template ignored: substitution failure [with Op = pto::RowExpandMulOp<float>, TileDataDst = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc0 = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc1 = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataTmp = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *]: type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' cannot be used prior to '::' because it has no members
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:188:26: note: candidate function template not viable: requires 5 arguments, but 6 were provided
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Quant/quant/quant_kernel.cpp:32:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/pto-inst.hpp:30:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr.hpp:18:
In file included from /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/common/pto_instr_impl.hpp:141:
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandMul.hpp:103:54: error: member reference type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' is a pointer; did you mean to use '->'?
            dst.data(), src1.data(), src0.data(), tmp.data(), validRow, validCol);
                                                  ~~~^
                                                     ->
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandMul.hpp:102:9: error: no matching function for call to 'TRowExpandBin'
        TRowExpandBin<RowExpandMulOp<T>, TileDataDst, TileDataSrc1, TileDataSrc0, TileDataTmp>(
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:211:26: note: candidate template ignored: substitution failure [with Op = pto::RowExpandMulOp<float>, TileDataDst = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc0 = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataSrc1 = pto::Tile<pto::TileType::Vec, float, 32, 32, pto::BLayout::RowMajor, 32, 32, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null>, TileDataTmp = pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *]: type 'pto::Tile<pto::TileType::Vec, float, 32, 1, pto::BLayout::ColMajor, 32, 1, pto::SLayout::NoneBox, 512, pto::PadValue::Null, pto::CompactMode::Null> *' cannot be used prior to '::' because it has no members
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/payload/pto-isa/include/pto/npu/a2a3/TRowExpandBinOp.hpp:188:26: note: candidate function template not viable: requires 5 arguments, but 6 were provided
__tf__ PTO_INTERNAL void TRowExpandBin(typename TileDataDst::TileDType __out__ dst,
                         ^
5 errors generated.
gmake[2]: *** [CMakeFiles/quant_kernel.dir/build.make:76: CMakeFiles/quant_kernel.dir/quant_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/quant_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:18:07] ERROR: testcase failed (exit 2): quant
partition5d_dynamic

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Partition5D/partition5d_dynamic/partition5d_dynamic_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_run_partition_impl(__gm__ float* v1, __gm__ float* v2, int64_t v3, int64_t v4, int64_t v5, int64_t v6, int64_t v7, int64_t v8, int64_t v9, int64_t v10, int64_t v11, int64_t v12) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Partition5D/partition5d_dynamic/partition5d_dynamic_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_run_partition_impl(__gm__ float* v1, __gm__ float* v2, int64_t v3, int64_t v4, int64_t v5, int64_t v6, int64_t v7, int64_t v8, int64_t v9, int64_t v10, int64_t v11, int64_t v12) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Partition5D/partition5d_dynamic/partition5d_dynamic_kernel.cpp:99:3: error: no matching function for call to '__ptoas_run_partition_impl'
  __ptoas_run_partition_impl(v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Partition5D/partition5d_dynamic/partition5d_dynamic_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_run_partition_impl(__gm__ float* v1, __gm__ float* v2, int64_t v3, int64_t v4, int64_t v5, int64_t v6, int64_t v7, int64_t v8, int64_t v9, int64_t v10, int64_t v11, int64_t v12) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/partition5d_dynamic_kernel.dir/build.make:76: CMakeFiles/partition5d_dynamic_kernel.dir/partition5d_dynamic_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/partition5d_dynamic_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:18:08] ERROR: testcase failed (exit 2): partition5d_dynamic
partition5d

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Partition5D/partition5d/partition5d_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_run_partition_impl(__gm__ float* v1, __gm__ float* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Partition5D/partition5d/partition5d_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_run_partition_impl(__gm__ float* v1, __gm__ float* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Partition5D/partition5d/partition5d_kernel.cpp:99:3: error: no matching function for call to '__ptoas_run_partition_impl'
  __ptoas_run_partition_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Partition5D/partition5d/partition5d_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_run_partition_impl(__gm__ float* v1, __gm__ float* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/partition5d_kernel.dir/build.make:76: CMakeFiles/partition5d_kernel.dir/partition5d_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/partition5d_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:18:09] ERROR: testcase failed (exit 2): partition5d
scatter

stage=run info=exit=2

[ERROR] Mismatch: golden_v3.bin vs v3.bin, max diff=4.940644979476929 at idx=27 (golden=2.725048542022705, out=-2.2155964374542236, dtype=float32)
[ERROR] compare failed
[2026-06-23 14:18:32] ERROR: testcase failed (exit 2): scatter
sparse_attn_test_incore_7

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/sparse_attn_test_incore_7/sparse_attn_test_incore_7_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_sparse_attn_test_incore_7_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/sparse_attn_test_incore_7/sparse_attn_test_incore_7_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_sparse_attn_test_incore_7_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/sparse_attn_test_incore_7/sparse_attn_test_incore_7_kernel.cpp:156:3: error: no matching function for call to '__ptoas_sparse_attn_test_incore_7_impl'
  __ptoas_sparse_attn_test_incore_7_impl(v1, v2, v3, v4, v5);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/sparse_attn_test_incore_7/sparse_attn_test_incore_7_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_sparse_attn_test_incore_7_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/sparse_attn_test_incore_7_kernel.dir/build.make:76: CMakeFiles/sparse_attn_test_incore_7_kernel.dir/sparse_attn_test_incore_7_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/sparse_attn_test_incore_7_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:19:29] ERROR: testcase failed (exit 2): sparse_attn_test_incore_7
decode_hca_test_incore_54

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_hca_test_incore_54/decode_hca_test_incore_54_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_decode_hca_test_incore_54_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_hca_test_incore_54/decode_hca_test_incore_54_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_decode_hca_test_incore_54_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_hca_test_incore_54/decode_hca_test_incore_54_kernel.cpp:156:3: error: no matching function for call to '__ptoas_decode_hca_test_incore_54_impl'
  __ptoas_decode_hca_test_incore_54_impl(v1, v2, v3, v4, v5);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_hca_test_incore_54/decode_hca_test_incore_54_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_decode_hca_test_incore_54_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/decode_hca_test_incore_54_kernel.dir/build.make:76: CMakeFiles/decode_hca_test_incore_54_kernel.dir/decode_hca_test_incore_54_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/decode_hca_test_incore_54_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:19:30] ERROR: testcase failed (exit 2): decode_hca_test_incore_54
attention_swa_test_incore_40

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_swa_test_incore_40/attention_swa_test_incore_40_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_attention_swa_test_incore_40_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_swa_test_incore_40/attention_swa_test_incore_40_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_attention_swa_test_incore_40_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_swa_test_incore_40/attention_swa_test_incore_40_kernel.cpp:156:3: error: no matching function for call to '__ptoas_attention_swa_test_incore_40_impl'
  __ptoas_attention_swa_test_incore_40_impl(v1, v2, v3, v4, v5);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_swa_test_incore_40/attention_swa_test_incore_40_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_attention_swa_test_incore_40_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/attention_swa_test_incore_40_kernel.dir/build.make:76: CMakeFiles/attention_swa_test_incore_40_kernel.dir/attention_swa_test_incore_40_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/attention_swa_test_incore_40_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:19:31] ERROR: testcase failed (exit 2): attention_swa_test_incore_40

@reedhecre

Copy link
Copy Markdown

A3 板测失败详情:PR #824

decode_swa_test_incore_40

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_swa_test_incore_40/decode_swa_test_incore_40_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_decode_swa_test_incore_40_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_swa_test_incore_40/decode_swa_test_incore_40_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_decode_swa_test_incore_40_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_swa_test_incore_40/decode_swa_test_incore_40_kernel.cpp:156:3: error: no matching function for call to '__ptoas_decode_swa_test_incore_40_impl'
  __ptoas_decode_swa_test_incore_40_impl(v1, v2, v3, v4, v5);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_swa_test_incore_40/decode_swa_test_incore_40_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_decode_swa_test_incore_40_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/decode_swa_test_incore_40_kernel.dir/build.make:76: CMakeFiles/decode_swa_test_incore_40_kernel.dir/decode_swa_test_incore_40_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/decode_swa_test_incore_40_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:19:33] ERROR: testcase failed (exit 2): decode_swa_test_incore_40
decode_csa_test_incore_81

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_csa_test_incore_81/decode_csa_test_incore_81_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_decode_csa_test_incore_81_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_csa_test_incore_81/decode_csa_test_incore_81_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_decode_csa_test_incore_81_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_csa_test_incore_81/decode_csa_test_incore_81_kernel.cpp:156:3: error: no matching function for call to '__ptoas_decode_csa_test_incore_81_impl'
  __ptoas_decode_csa_test_incore_81_impl(v1, v2, v3, v4, v5);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/decode_csa_test_incore_81/decode_csa_test_incore_81_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_decode_csa_test_incore_81_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/decode_csa_test_incore_81_kernel.dir/build.make:76: CMakeFiles/decode_csa_test_incore_81_kernel.dir/decode_csa_test_incore_81_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/decode_csa_test_incore_81_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:19:34] ERROR: testcase failed (exit 2): decode_csa_test_incore_81
attention_hca_test_incore_54

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_hca_test_incore_54/attention_hca_test_incore_54_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_attention_hca_test_incore_54_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_hca_test_incore_54/attention_hca_test_incore_54_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_attention_hca_test_incore_54_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_hca_test_incore_54/attention_hca_test_incore_54_kernel.cpp:156:3: error: no matching function for call to '__ptoas_attention_hca_test_incore_54_impl'
  __ptoas_attention_hca_test_incore_54_impl(v1, v2, v3, v4, v5);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_hca_test_incore_54/attention_hca_test_incore_54_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_attention_hca_test_incore_54_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/attention_hca_test_incore_54_kernel.dir/build.make:76: CMakeFiles/attention_hca_test_incore_54_kernel.dir/attention_hca_test_incore_54_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/attention_hca_test_incore_54_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:19:35] ERROR: testcase failed (exit 2): attention_hca_test_incore_54
attention_csa_test_refresh_incore_81

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_csa_test_refresh_incore_81/attention_csa_test_refresh_incore_81_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_attention_csa_test_refresh_incore_81_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_csa_test_refresh_incore_81/attention_csa_test_refresh_incore_81_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_attention_csa_test_refresh_incore_81_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_csa_test_refresh_incore_81/attention_csa_test_refresh_incore_81_kernel.cpp:156:3: error: no matching function for call to '__ptoas_attention_csa_test_refresh_incore_81_impl'
  __ptoas_attention_csa_test_refresh_incore_81_impl(v1, v2, v3, v4, v5);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/DeepseekV4DecodeA3/attention_csa_test_refresh_incore_81/attention_csa_test_refresh_incore_81_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_attention_csa_test_refresh_incore_81_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/attention_csa_test_refresh_incore_81_kernel.dir/build.make:76: CMakeFiles/attention_csa_test_refresh_incore_81_kernel.dir/attention_csa_test_refresh_incore_81_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/attention_csa_test_refresh_incore_81_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:19:37] ERROR: testcase failed (exit 2): attention_csa_test_refresh_incore_81
tensor_view_layout_dn

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Layout/tensor_view_layout_dn/tensor_view_layout_dn_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_run_impl(__gm__ float* v1, __gm__ float* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Layout/tensor_view_layout_dn/tensor_view_layout_dn_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_run_impl(__gm__ float* v1, __gm__ float* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Layout/tensor_view_layout_dn/tensor_view_layout_dn_kernel.cpp:97:3: error: no matching function for call to '__ptoas_run_impl'
  __ptoas_run_impl(v1, v2);
  ^~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Layout/tensor_view_layout_dn/tensor_view_layout_dn_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_run_impl(__gm__ float* v1, __gm__ float* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/tensor_view_layout_dn_kernel.dir/build.make:76: CMakeFiles/tensor_view_layout_dn_kernel.dir/tensor_view_layout_dn_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/tensor_view_layout_dn_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:19:38] ERROR: testcase failed (exit 2): tensor_view_layout_dn
rowexpandsub

stage=run info=exit=2

[ERROR] Mismatch: golden_v3.bin vs v3.bin, max diff=5.565198540687561 at idx=202 (golden=1.555970311164856, out=-4.009228229522705, dtype=float32)
[ERROR] compare failed
[2026-06-23 14:20:06] ERROR: testcase failed (exit 2): rowexpandsub
rope_kv_cache

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/rope_kv_cache/rope_kv_cache_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_rope_kv_cache_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ bfloat16_t* v3, __gm__ float* v4, __gm__ float* v5, __gm__ float* v6, __gm__ float* v7, __gm__ float* v8, __gm__ float* v9, __gm__ float* v10, int64_t v11, int64_t v12) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/rope_kv_cache/rope_kv_cache_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_rope_kv_cache_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ bfloat16_t* v3, __gm__ float* v4, __gm__ float* v5, __gm__ float* v6, __gm__ float* v7, __gm__ float* v8, __gm__ float* v9, __gm__ float* v10, int64_t v11, int64_t v12) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/rope_kv_cache/rope_kv_cache_kernel.cpp:336:3: error: no matching function for call to '__ptoas_rope_kv_cache_impl'
  __ptoas_rope_kv_cache_impl(v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/rope_kv_cache/rope_kv_cache_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_rope_kv_cache_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ bfloat16_t* v3, __gm__ float* v4, __gm__ float* v5, __gm__ float* v6, __gm__ float* v7, __gm__ float* v8, __gm__ float* v9, __gm__ float* v10, int64_t v11, int64_t v12) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/rope_kv_cache_kernel.dir/build.make:76: CMakeFiles/rope_kv_cache_kernel.dir/rope_kv_cache_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/rope_kv_cache_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:07] ERROR: testcase failed (exit 2): rope_kv_cache
qwen3_decode_incore_4

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_4/qwen3_decode_incore_4_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_4_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ bfloat16_t* v3, int64_t v4, int64_t v5, int32_t v6, int32_t v7) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_4/qwen3_decode_incore_4_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_4_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ bfloat16_t* v3, int64_t v4, int64_t v5, int32_t v6, int32_t v7) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_4/qwen3_decode_incore_4_kernel.cpp:234:3: error: no matching function for call to '__ptoas_qwen3_decode_incore_4_impl'
  __ptoas_qwen3_decode_incore_4_impl(v1, v2, v3, v4, v5, v6, v7);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_4/qwen3_decode_incore_4_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_4_impl(__gm__ bfloat16_t* v1, __gm__ float* v2, __gm__ bfloat16_t* v3, int64_t v4, int64_t v5, int32_t v6, int32_t v7) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_4_kernel.dir/build.make:76: CMakeFiles/qwen3_decode_incore_4_kernel.dir/qwen3_decode_incore_4_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/qwen3_decode_incore_4_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:09] ERROR: testcase failed (exit 2): qwen3_decode_incore_4
post_rmsnorm

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/post_rmsnorm/post_rmsnorm_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_post_rmsnorm_impl(__gm__ float* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/post_rmsnorm/post_rmsnorm_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_post_rmsnorm_impl(__gm__ float* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/post_rmsnorm/post_rmsnorm_kernel.cpp:302:3: error: no matching function for call to '__ptoas_post_rmsnorm_impl'
  __ptoas_post_rmsnorm_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/post_rmsnorm/post_rmsnorm_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_post_rmsnorm_impl(__gm__ float* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/post_rmsnorm_kernel.dir/build.make:76: CMakeFiles/post_rmsnorm_kernel.dir/post_rmsnorm_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/post_rmsnorm_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:10] ERROR: testcase failed (exit 2): post_rmsnorm
qwen3_decode_incore_1

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_1/qwen3_decode_incore_1_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_1_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_1/qwen3_decode_incore_1_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_1_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_1/qwen3_decode_incore_1_kernel.cpp:258:3: error: no matching function for call to '__ptoas_qwen3_decode_incore_1_impl'
  __ptoas_qwen3_decode_incore_1_impl(v1, v2, v3, v4, v5);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_1/qwen3_decode_incore_1_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_1_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3, int32_t v4, int32_t v5) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_1_kernel.dir/build.make:76: CMakeFiles/qwen3_decode_incore_1_kernel.dir/qwen3_decode_incore_1_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/qwen3_decode_incore_1_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:25] ERROR: testcase failed (exit 2): qwen3_decode_incore_1
qwen3_decode_incore_10

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_10/qwen3_decode_incore_10_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_10_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3, int64_t v4, int32_t v5, int32_t v6) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_10/qwen3_decode_incore_10_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_10_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3, int64_t v4, int32_t v5, int32_t v6) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_10/qwen3_decode_incore_10_kernel.cpp:341:3: error: no matching function for call to '__ptoas_qwen3_decode_incore_10_impl'
  __ptoas_qwen3_decode_incore_10_impl(v1, v2, v3, v4, v5, v6);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_10/qwen3_decode_incore_10_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_10_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3, int64_t v4, int32_t v5, int32_t v6) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_10_kernel.dir/build.make:76: CMakeFiles/qwen3_decode_incore_10_kernel.dir/qwen3_decode_incore_10_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/qwen3_decode_incore_10_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:26] ERROR: testcase failed (exit 2): qwen3_decode_incore_10
qwen3_decode_incore_11

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_11/qwen3_decode_incore_11_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_11_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3, int64_t v4, int32_t v5, int32_t v6) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_11/qwen3_decode_incore_11_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_11_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3, int64_t v4, int32_t v5, int32_t v6) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_11/qwen3_decode_incore_11_kernel.cpp:341:3: error: no matching function for call to '__ptoas_qwen3_decode_incore_11_impl'
  __ptoas_qwen3_decode_incore_11_impl(v1, v2, v3, v4, v5, v6);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_11/qwen3_decode_incore_11_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_11_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3, int64_t v4, int32_t v5, int32_t v6) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_11_kernel.dir/build.make:76: CMakeFiles/qwen3_decode_incore_11_kernel.dir/qwen3_decode_incore_11_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/qwen3_decode_incore_11_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:28] ERROR: testcase failed (exit 2): qwen3_decode_incore_11
rmsnorm

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/rmsnorm/rmsnorm_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_rmsnorm_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/rmsnorm/rmsnorm_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_rmsnorm_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/rmsnorm/rmsnorm_kernel.cpp:508:3: error: no matching function for call to '__ptoas_rmsnorm_impl'
  __ptoas_rmsnorm_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/rmsnorm/rmsnorm_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_rmsnorm_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/rmsnorm_kernel.dir/build.make:76: CMakeFiles/rmsnorm_kernel.dir/rmsnorm_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/rmsnorm_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:29] ERROR: testcase failed (exit 2): rmsnorm
qwen3_decode_incore_6

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_6/qwen3_decode_incore_6_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_6_impl(__gm__ float* v1, __gm__ bfloat16_t* v2, __gm__ bfloat16_t* v3, int64_t v4, int64_t v5, int32_t v6, int32_t v7) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_6/qwen3_decode_incore_6_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_6_impl(__gm__ float* v1, __gm__ bfloat16_t* v2, __gm__ bfloat16_t* v3, int64_t v4, int64_t v5, int32_t v6, int32_t v7) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_6/qwen3_decode_incore_6_kernel.cpp:241:3: error: no matching function for call to '__ptoas_qwen3_decode_incore_6_impl'
  __ptoas_qwen3_decode_incore_6_impl(v1, v2, v3, v4, v5, v6, v7);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_6/qwen3_decode_incore_6_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_6_impl(__gm__ float* v1, __gm__ bfloat16_t* v2, __gm__ bfloat16_t* v3, int64_t v4, int64_t v5, int32_t v6, int32_t v7) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_6_kernel.dir/build.make:76: CMakeFiles/qwen3_decode_incore_6_kernel.dir/qwen3_decode_incore_6_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/qwen3_decode_incore_6_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:30] ERROR: testcase failed (exit 2): qwen3_decode_incore_6
qwen3_decode_incore_2

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_2/qwen3_decode_incore_2_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_2_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ bfloat16_t* v3, __gm__ float* v4, __gm__ float* v5, int32_t v6, int32_t v7) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_2/qwen3_decode_incore_2_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_2_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ bfloat16_t* v3, __gm__ float* v4, __gm__ float* v5, int32_t v6, int32_t v7) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_2/qwen3_decode_incore_2_kernel.cpp:381:3: error: no matching function for call to '__ptoas_qwen3_decode_incore_2_impl'
  __ptoas_qwen3_decode_incore_2_impl(v1, v2, v3, v4, v5, v6, v7);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_2/qwen3_decode_incore_2_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_2_impl(__gm__ bfloat16_t* v1, __gm__ bfloat16_t* v2, __gm__ bfloat16_t* v3, __gm__ float* v4, __gm__ float* v5, int32_t v6, int32_t v7) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_2_kernel.dir/build.make:76: CMakeFiles/qwen3_decode_incore_2_kernel.dir/qwen3_decode_incore_2_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/qwen3_decode_incore_2_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:32] ERROR: testcase failed (exit 2): qwen3_decode_incore_2
qwen3_decode_incore_7

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_7/qwen3_decode_incore_7_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_7_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, __gm__ bfloat16_t* v4, int64_t v5, int32_t v6, int32_t v7) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_7/qwen3_decode_incore_7_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_7_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, __gm__ bfloat16_t* v4, int64_t v5, int32_t v6, int32_t v7) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_7/qwen3_decode_incore_7_kernel.cpp:497:3: error: no matching function for call to '__ptoas_qwen3_decode_incore_7_impl'
  __ptoas_qwen3_decode_incore_7_impl(v1, v2, v3, v4, v5, v6, v7);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_7/qwen3_decode_incore_7_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_7_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, __gm__ bfloat16_t* v4, int64_t v5, int32_t v6, int32_t v7) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_7_kernel.dir/build.make:76: CMakeFiles/qwen3_decode_incore_7_kernel.dir/qwen3_decode_incore_7_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/qwen3_decode_incore_7_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:33] ERROR: testcase failed (exit 2): qwen3_decode_incore_7
qwen3_decode_incore_5

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_5/qwen3_decode_incore_5_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_5_impl(__gm__ float* v1, __gm__ float* v2, __gm__ bfloat16_t* v3, __gm__ float* v4, int64_t v5, int32_t v6, int32_t v7, int32_t v8) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_5/qwen3_decode_incore_5_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_5_impl(__gm__ float* v1, __gm__ float* v2, __gm__ bfloat16_t* v3, __gm__ float* v4, int64_t v5, int32_t v6, int32_t v7, int32_t v8) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_5/qwen3_decode_incore_5_kernel.cpp:277:3: error: no matching function for call to '__ptoas_qwen3_decode_incore_5_impl'
  __ptoas_qwen3_decode_incore_5_impl(v1, v2, v3, v4, v5, v6, v7, v8);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_5/qwen3_decode_incore_5_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_5_impl(__gm__ float* v1, __gm__ float* v2, __gm__ bfloat16_t* v3, __gm__ float* v4, int64_t v5, int32_t v6, int32_t v7, int32_t v8) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_5_kernel.dir/build.make:76: CMakeFiles/qwen3_decode_incore_5_kernel.dir/qwen3_decode_incore_5_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/qwen3_decode_incore_5_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:34] ERROR: testcase failed (exit 2): qwen3_decode_incore_5
qwen3_decode_incore_12

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_12/qwen3_decode_incore_12_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_12_impl(__gm__ float* v1, __gm__ float* v2, __gm__ bfloat16_t* v3, int64_t v4, int32_t v5, int32_t v6) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_12/qwen3_decode_incore_12_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_12_impl(__gm__ float* v1, __gm__ float* v2, __gm__ bfloat16_t* v3, int64_t v4, int32_t v5, int32_t v6) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_12/qwen3_decode_incore_12_kernel.cpp:158:3: error: no matching function for call to '__ptoas_qwen3_decode_incore_12_impl'
  __ptoas_qwen3_decode_incore_12_impl(v1, v2, v3, v4, v5, v6);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_12/qwen3_decode_incore_12_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_qwen3_decode_incore_12_impl(__gm__ float* v1, __gm__ float* v2, __gm__ bfloat16_t* v3, int64_t v4, int32_t v5, int32_t v6) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_12_kernel.dir/build.make:76: CMakeFiles/qwen3_decode_incore_12_kernel.dir/qwen3_decode_incore_12_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/qwen3_decode_incore_12_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:20:36] ERROR: testcase failed (exit 2): qwen3_decode_incore_12
sels

stage=run info=exit=2

[ERROR] Mismatch: golden_v3.bin vs v3.bin, max diff=66.99866366386414 at idx=92 (golden=-2.9986636638641357, out=64.0, dtype=float32)
[ERROR] compare failed
[2026-06-23 14:20:40] ERROR: testcase failed (exit 2): sels
tprefetch_async_binding

stage=run info=exit=1

[SDMA] aclrtSynchronizeStream (aicpu) failed
[WARN] SdmaWorkspaceManager::Init failed - TPREFETCH_ASYNC will fall back to no-op prefetch
[ERROR] aclrtSynchronizeStream(stream) failed: 507018 (/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/TPrefetchAsync/tprefetch_async_binding/main.cpp:132)
[ERROR] RecentErrMsg: E39999: Inner Error!
E39999[PID: 1369173] 2026-06-23-14:21:29.830.521 (E39999):  The error from device(chipId:0, dieId:0), serial number is 72, an exception occurred during AICPU execution, stream_id:45, task_id:0, errcode:0, msg:aicpu execute failed.[FUNC:ProcessStarsAicpuErrorInfo][FILE:device_error_proc.cc][LINE:1644]
        TraceBack (most recent call last):
       Kernel task happen error, retCode=0x2a, [aicpu exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1729]
       Aicpu kernel execute failed, device_id=0, stream_id=45, task_id=0, soName=libcpu_kernels.so, funcName=RunCpuKernel, kernelName=ShmemSdmaStarsQuery, errorCode=0x2a.[FUNC:PrintAicpuErrorInfo][FILE:davinci_kernel_task.cc][LINE:1435]
       rtStreamSynchronize execution failed, reason=aicpu exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507018[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
       Failed to submit kernel task, retCode=0x715002a.[FUNC:LaunchKernelSubmit][FILE:context.cc][LINE:1223]
       kernel launch submit failed.[FUNC:LaunchKernel][FILE:context.cc][LINE:1349]
       rtKernelLaunch execution failed, reason=aicpu exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
[2026-06-23 14:24:07] ERROR: testcase failed (exit 1): tprefetch_async_binding
test_barrier_sync

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_barrier_sync/test_barrier_sync_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_test_barrier_sync_py_impl() {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_barrier_sync/test_barrier_sync_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_test_barrier_sync_py_impl() {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_barrier_sync/test_barrier_sync_kernel.cpp:84:3: error: no matching function for call to '__ptoas_test_barrier_sync_py_impl'
  __ptoas_test_barrier_sync_py_impl();
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_barrier_sync/test_barrier_sync_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_test_barrier_sync_py_impl() {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/test_barrier_sync_kernel.dir/build.make:76: CMakeFiles/test_barrier_sync_kernel.dir/test_barrier_sync_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/test_barrier_sync_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:24:31] ERROR: testcase failed (exit 2): test_barrier_sync
matmul

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/matmul/matmul_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_RunTMATMULSplitK_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, __gm__ float* v4, bool v5) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/matmul/matmul_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_RunTMATMULSplitK_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, __gm__ float* v4, bool v5) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/matmul/matmul_kernel.cpp:175:3: error: no matching function for call to '__ptoas_RunTMATMULSplitK_impl'
  __ptoas_RunTMATMULSplitK_impl(v1, v2, v3, v4, v5);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/matmul/matmul_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_RunTMATMULSplitK_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, __gm__ float* v4, bool v5) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/matmul_kernel.dir/build.make:76: CMakeFiles/matmul_kernel.dir/matmul_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/matmul_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:25:07] ERROR: testcase failed (exit 2): matmul
add_double_dynamic

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/add_double_dynamic/add_double_dynamic_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_vec_add_1d_dynamic_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/add_double_dynamic/add_double_dynamic_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_vec_add_1d_dynamic_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/add_double_dynamic/add_double_dynamic_kernel.cpp:183:3: error: no matching function for call to '__ptoas_vec_add_1d_dynamic_impl'
  __ptoas_vec_add_1d_dynamic_impl(v1, v2, v3, v4);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/add_double_dynamic/add_double_dynamic_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_vec_add_1d_dynamic_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3, int32_t v4) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/add_double_dynamic_kernel.dir/build.make:76: CMakeFiles/add_double_dynamic_kernel.dir/add_double_dynamic_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/add_double_dynamic_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:25:08] ERROR: testcase failed (exit 2): add_double_dynamic
nested_loop_confliect

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/nested_loop_confliect/nested_loop_confliect_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_nested_loop_sync_impl(__gm__ float* v1, __gm__ float* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/nested_loop_confliect/nested_loop_confliect_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_nested_loop_sync_impl(__gm__ float* v1, __gm__ float* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/nested_loop_confliect/nested_loop_confliect_kernel.cpp:111:3: error: no matching function for call to '__ptoas_nested_loop_sync_impl'
  __ptoas_nested_loop_sync_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/nested_loop_confliect/nested_loop_confliect_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_nested_loop_sync_impl(__gm__ float* v1, __gm__ float* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/nested_loop_confliect_kernel.dir/build.make:76: CMakeFiles/nested_loop_confliect_kernel.dir/nested_loop_confliect_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/nested_loop_confliect_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:25:10] ERROR: testcase failed (exit 2): nested_loop_confliect
rar_optimization_test

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/rar_optimization_test/rar_optimization_test_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_rar_hazard_check_impl(__gm__ float* v1, __gm__ float* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/rar_optimization_test/rar_optimization_test_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_rar_hazard_check_impl(__gm__ float* v1, __gm__ float* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/rar_optimization_test/rar_optimization_test_kernel.cpp:100:3: error: no matching function for call to '__ptoas_rar_hazard_check_impl'
  __ptoas_rar_hazard_check_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/rar_optimization_test/rar_optimization_test_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_rar_hazard_check_impl(__gm__ float* v1, __gm__ float* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/rar_optimization_test_kernel.dir/build.make:76: CMakeFiles/rar_optimization_test_kernel.dir/rar_optimization_test_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/rar_optimization_test_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:25:11] ERROR: testcase failed (exit 2): rar_optimization_test
test_dynamic_valid_shape

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_dynamic_valid_shape/test_dynamic_valid_shape_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_test_dynamic_valid_shape_impl(__gm__ float* v1, __gm__ float* v2, int32_t v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_dynamic_valid_shape/test_dynamic_valid_shape_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_test_dynamic_valid_shape_impl(__gm__ float* v1, __gm__ float* v2, int32_t v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_dynamic_valid_shape/test_dynamic_valid_shape_kernel.cpp:114:3: error: no matching function for call to '__ptoas_test_dynamic_valid_shape_impl'
  __ptoas_test_dynamic_valid_shape_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_dynamic_valid_shape/test_dynamic_valid_shape_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_test_dynamic_valid_shape_impl(__gm__ float* v1, __gm__ float* v2, int32_t v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/test_dynamic_valid_shape_kernel.dir/build.make:76: CMakeFiles/test_dynamic_valid_shape_kernel.dir/test_dynamic_valid_shape_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/test_dynamic_valid_shape_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:25:20] ERROR: testcase failed (exit 2): test_dynamic_valid_shape
test_auto_sync_tail_hint

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_auto_sync_tail_hint/test_auto_sync_tail_hint_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_test_auto_sync_tail_hint_impl(__gm__ float* v1, __gm__ float* v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_auto_sync_tail_hint/test_auto_sync_tail_hint_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_test_auto_sync_tail_hint_impl(__gm__ float* v1, __gm__ float* v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_auto_sync_tail_hint/test_auto_sync_tail_hint_kernel.cpp:100:3: error: no matching function for call to '__ptoas_test_auto_sync_tail_hint_impl'
  __ptoas_test_auto_sync_tail_hint_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/test_auto_sync_tail_hint/test_auto_sync_tail_hint_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_test_auto_sync_tail_hint_impl(__gm__ float* v1, __gm__ float* v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/test_auto_sync_tail_hint_kernel.dir/build.make:76: CMakeFiles/test_auto_sync_tail_hint_kernel.dir/test_auto_sync_tail_hint_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/test_auto_sync_tail_hint_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:25:30] ERROR: testcase failed (exit 2): test_auto_sync_tail_hint

@reedhecre

Copy link
Copy Markdown

A3 板测失败详情:PR #824

compensation_test

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/compensation_test/compensation_test_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_compensation_check_impl(__gm__ float* v1, bool v2) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/compensation_test/compensation_test_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_compensation_check_impl(__gm__ float* v1, bool v2) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/compensation_test/compensation_test_kernel.cpp:97:3: error: no matching function for call to '__ptoas_compensation_check_impl'
  __ptoas_compensation_check_impl(v1, v2);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Sync/compensation_test/compensation_test_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_compensation_check_impl(__gm__ float* v1, bool v2) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/compensation_test_kernel.dir/build.make:76: CMakeFiles/compensation_test_kernel.dir/compensation_test_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/compensation_test_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:25:52] ERROR: testcase failed (exit 2): compensation_test
rem

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Rem/rem/rem_kernel.cpp:75:23: error: cannot combine with previous 'extern' declaration specifier
extern "C" __global__ static AICORE inline void __ptoas_rem_kernel_2d_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                      ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Rem/rem/rem_kernel.cpp:75:12: error: __global__ function can not be inlined
extern "C" __global__ static AICORE inline void __ptoas_rem_kernel_2d_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
           ^
/usr/local/Ascend/cann-9.0.0/tools/bisheng_compiler/lib/clang/15.0.5/include/__clang_cce_defines.h:31:20: note: expanded from macro '__global__'
#define __global__ __attribute__((cce_kernel))
                   ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Rem/rem/rem_kernel.cpp:123:3: error: no matching function for call to '__ptoas_rem_kernel_2d_impl'
  __ptoas_rem_kernel_2d_impl(v1, v2, v3);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260623_135338_merged_pr824/npu_validation/Rem/rem/rem_kernel.cpp:75:49: note: candidate function not viable: call to __global__ [aicore] function from __global__ [aicore] function
extern "C" __global__ static AICORE inline void __ptoas_rem_kernel_2d_impl(__gm__ float* v1, __gm__ float* v2, __gm__ float* v3) {
                                                ^
3 errors generated.
gmake[2]: *** [CMakeFiles/rem_kernel.dir/build.make:76: CMakeFiles/rem_kernel.dir/rem_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/rem_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-23 14:26:07] ERROR: testcase failed (exit 2): rem
partmin

stage=run info=exit=2

[ERROR] Mismatch: golden_v3.bin vs v3.bin, max diff=nan at idx=112 (golden=-0.0, out=nan, dtype=float16)
[ERROR] compare failed
[2026-06-23 14:26:12] ERROR: testcase failed (exit 2): partmin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants