feat(ptodsl): add flash-attention demos and mixed-backend subkernel pipeline support by Zhendong404 · Pull Request #816 · hw-native-sys/PTOAS

Zhendong404 · 2026-06-14T16:42:06Z

Summary

This PR brings in a PTODSL/ptoas update centered on flash-attention demos, mixed-kernel or mixed-backend compilation, and subkernel/backend helper lowering fixes.

Compared with hw-native-sys/PTOAS:main, this branch is ahead by 12 commits and includes both frontend PTODSL improvements and the supporting PTOAS lowering/runtime/test updates.

Main Changes

Add PTODSL flash-attention demos and example code

Add ptodsl/examples/flash_attention/flash_attention_cv_split.py
Add ptodsl/examples/flash_attention/flash_attention_vf_fusion.py
Add supporting example files under ptodsl/examples/flash_attention/
Add FA-related PTODSL example entry files such as ptodsl/examples/fa_dn_ptodsl.py

Support mixed-kernel and mixed-backend hybrid compilation in PTODSL

Extend PTODSL tracing / session / subkernel compilation flow to support mixed child modules
Add backend child compile-unit assembly and helper materialization support in PTOAS
Add new normalization/materialization passes for uncovered tile sections and tile handles
Update driver-side mixed-backend child module compilation and diagnostics

Support subkernel-related sync and helper lowering

Add insert-sync support for subkernels
Inline backend helpers after shared mainline materialization so tile_buf ABI is preserved
Improve symbol/logical-name handling for peer/helper lookup across backend-partitioned modules

Add A5 gm_tensor slot support for PTODSL pipe flow

Frontend/runtime/lowering changes for A5 pipe slot handling
Add corresponding lit coverage for frontend ordering and split/nosplit cases

Fix EmitC / VPTO / graph-sync related issues

Sink PTOAS__TILE_DATA reads after TASSIGN
Rewrite malformed nested emitc.verbatim trailing semicolons in emitted C++ as a workaround
Let graph sync solver recognize pto.section.* regions
Fix vmulscvt emission in the CANN900 LLVM emitter
Support AST rewrite for non-entry PTODSL kernels

Refresh PTODSL documentation

Update ptodsl/docs/user_guide/01-introduction.md
Update ptodsl/docs/user_guide/03-kernel-entry-and-subkernels.md
Refresh quick start / walkthrough / additional examples to match the new PTODSL surface

Add and update test coverage

Add and update PTODSL Python tests
Add and update lit tests for mixed backend, helper ABI, peer references, uncovered tile section normalization, A5 pipe slot handling, and EmitC fixes
Update test/lit/lit.cfg.py

Scope

Diff summary:

120 files changed
12652 insertions
968 deletions

Validation

Confirmed branch fa-demo-restack is already pushed to fork Zhendong404/PTOAS
Confirmed compare target is clean against hw-native-sys/PTOAS:main
Confirmed there is no existing open PR for Zhendong404:fa-demo-restack

gemini-code-assist

Code Review

This pull request introduces backend-partitioned container support, a new pass to normalize uncovered tile sections, and helper inlining for PTODSL subkernel calls. It also updates the EmitC lowering pipeline to handle pointer-like tile buffer addresses and adds several new examples and tests. The reviewer feedback focuses on improving robustness and correctness: refactoring the uncovered segment collection to prevent over-aggressive splitting, adding defensive checks for memory space attributes in EmitC lowering to avoid potential crashes, ensuring deterministic symbol resolution during peer lookups, and using a more robust check for static dimensions.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-14T16:45:21Z

+static void collectUncoveredTopLevelSegments(
+    func::FuncOp funcOp, SmallVectorImpl<UncoveredTopLevelSegment> &segments) {
+  if (!funcOp || funcOp.isDeclaration() || !funcOp.getBody().hasOneBlock())
+    return;
+
+  Block &entryBlock = funcOp.getBody().front();
+  UncoveredTopLevelSegment current;
+
+  auto flushCurrent = [&]() {
+    if (!current.firstOp)
+      return;
+    segments.push_back(current);
+    current = {};
+  };
+
+  for (Operation &op : entryBlock.getOperations()) {
+    if (isa<func::ReturnOp>(op)) {
+      flushCurrent();
+      continue;
+    }
+
+    if (isExplicitSection(&op)) {
+      flushCurrent();
+      continue;
+    }
+
+    UncoveredTopLevelSegment opSummary = summarizeTopLevelOperation(&op);
+    if (!opSummary.containsTileOp) {
+      flushCurrent();
+      continue;
+    }
+
+    if (!current.firstOp) {
+      current = std::move(opSummary);
+      continue;
+    }
+
+    std::optional<InferredSectionKind> currentKind = inferSegmentKind(current);
+    std::optional<InferredSectionKind> opKind = inferSegmentKind(opSummary);
+    bool mustSplit = current.containsNestedExplicitSection ||
+                     opSummary.containsNestedExplicitSection || !currentKind ||
+                     !opKind || *currentKind != *opKind;
+    if (mustSplit) {
+      flushCurrent();
+      current = std::move(opSummary);
+      continue;
+    }
+
+    mergeSegmentSummary(current, opSummary);
+  }
+
+  flushCurrent();
+}


The current implementation of collectUncoveredTopLevelSegments flushes and splits the active segment immediately upon encountering any top-level operation that does not contain a tile op (e.g., a scalar calculation or a subkernel call). This results in unnecessary splitting of adjacent tile operations of the same kind, creating multiple small sections and potentially duplicating neutral top-level operations (like subkernel calls) into both Cube and Vector child modules during splitting.\n\nInstead, we should use a cohesive partitioning algorithm that accumulates neutral operations into the current segment, and only flushes/splits when there is an actual conflict between Vector and Cube kinds, or when hitting an explicit section/return.

static void collectUncoveredTopLevelSegments( func::FuncOp funcOp, SmallVectorImpl<UncoveredTopLevelSegment> &segments) { if (!funcOp || funcOp.isDeclaration() || !funcOp.getBody().hasOneBlock()) return; Block &entryBlock = funcOp.getBody().front(); UncoveredTopLevelSegment current; auto flushCurrent = [&]() { if (!current.firstOp) return; if (current.containsTileOp) segments.push_back(current); current = {}; }; for (Operation &op : entryBlock.getOperations()) { if (isa<func::ReturnOp>(op) || isExplicitSection(&op)) { flushCurrent(); continue; } UncoveredTopLevelSegment opSummary = summarizeTopLevelOperation(&op); if (opSummary.containsNestedExplicitSection) { flushCurrent(); segments.push_back(opSummary); continue; } if (!current.firstOp) { current = std::move(opSummary); continue; } std::optional<InferredSectionKind> currentKind = inferSegmentKind(current); std::optional<InferredSectionKind> opKind = inferSegmentKind(opSummary); if (currentKind && opKind && *currentKind != *opKind) { flushCurrent(); current = std::move(opSummary); } else { mergeSegmentSummary(current, opSummary); } } flushCurrent(); }

gemini-code-assist · 2026-06-14T16:45:21Z

+      std::string qualifier =
+          addrSpaceQualifier(type.getMemorySpace().getAddressSpace());


Calling type.getMemorySpace().getAddressSpace() directly without checking if the memory space attribute is present and is of type AddressSpaceAttr can lead to a crash or undefined behavior. It is safer to use a defensive check similar to other parts of this file.

Suggested change

std::string qualifier =

addrSpaceQualifier(type.getMemorySpace().getAddressSpace());

std::string qualifier = "__gm__";

if (auto ms = type.getMemorySpace()) {

if (auto ptoAttr = dyn_cast<pto::AddressSpaceAttr>(ms))

qualifier = addrSpaceQualifier(ptoAttr.getAddressSpace());

}

gemini-code-assist · 2026-06-14T16:45:21Z

+  if (auto ptrTy = dyn_cast<pto::PtrType>(originalCalleeArgTy)) {
+    elemTy = ptrTy.getElementType();
+    as = ptrTy.getMemorySpace().getAddressSpace();


Defensively check ptrTy.getMemorySpace() using dyn_cast_or_null<pto::AddressSpaceAttr> before calling getAddressSpace() to prevent potential crashes if the memory space attribute is null or of an unexpected type.

if (auto ptrTy = dyn_cast<pto::PtrType>(originalCalleeArgTy)) { elemTy = ptrTy.getElementType(); if (auto asAttr = dyn_cast_or_null<pto::AddressSpaceAttr>(ptrTy.getMemorySpace())) as = asAttr.getAddressSpace(); }

gemini-code-assist · 2026-06-14T16:45:21Z

+  SmallVector<func::FuncOp> fallbackMatches;
+  outerModule.walk([&](func::FuncOp funcOp) {
+    auto visibility = funcOp->getAttrOfType<StringAttr>("sym_visibility");
+    if (visibility && visibility.getValue() == "private")
+      return WalkResult::advance();
+
+    StringRef symbolName = funcOp.getSymName();
+    if (symbolName == target) {
+      fallbackMatches.clear();
+      fallbackMatches.push_back(funcOp);
+      return WalkResult::interrupt();
+    }
+    if (funcOp->hasAttr(kPTODSLLogicalNameAttrName) &&
+        getPTODSLLogicalNameOrSymbolName(funcOp) == target)
+      fallbackMatches.push_back(funcOp);
+    return WalkResult::advance();
+  });


Interrupting the walk immediately upon finding an exact symbol name match can lead to non-deterministic or incorrect symbol resolution if multiple sibling modules contain public functions with the same name. It is safer to collect all matches (both exact and logical) without interrupting, and then verify that exactly one unique match exists.

SmallVector<func::FuncOp> fallbackMatches; outerModule.walk([&](func::FuncOp funcOp) { auto visibility = funcOp->getAttrOfType<StringAttr>("sym_visibility"); if (visibility && visibility.getValue() == "private") return WalkResult::advance(); StringRef symbolName = funcOp.getSymName(); if (symbolName == target || (funcOp->hasAttr(kPTODSLLogicalNameAttrName) && getPTODSLLogicalNameOrSymbolName(funcOp) == target)) { fallbackMatches.push_back(funcOp); } return WalkResult::advance(); });

gemini-code-assist · 2026-06-14T16:45:21Z

+  bool isStatic = llvm::all_of(shape, [](int64_t dim) {
+    return dim != ShapedType::kDynamic;
+  });


Using dim != ShapedType::kDynamic to check for static dimensions is less robust than checking dim >= 0, as any negative dimension size in MLIR represents a dynamic or invalid dimension. Checking dim >= 0 prevents potential issues with other negative dimension representations.

Suggested change

bool isStatic = llvm::all_of(shape, [](int64_t dim) {

return dim != ShapedType::kDynamic;

});

bool isStatic = llvm::all_of(shape, [](int64_t dim) {

return dim >= 0;

});

reedhecre · 2026-06-14T16:45:26Z

Codex Review

该评论由 review 机器人自动更新。

PR: feat(ptodsl): add flash-attention demos and mixed-backend subkernel pipeline support #816 feat(ptodsl): add flash-attention demos and mixed-backend subkernel pipeline support
Author: Zhendong404
Base/Head: main / fa-demo-restack
Head SHA: 68069a05bd56
Trigger: PR 有新提交
Generated At: 2026-06-17T16:56:41Z
Previous Head SHA: 356a9436064b
Status: failed at codex-review (exit=1)

Summary

Review failed at stage codex-review: exit=1

Findings

未生成结构化 findings，因为 review 过程提前失败。

Log Tail

 test/lit/vpto/issue220_vrelu_i32_vpto_llvm.pto     |   10 +-
 .../vpto/non_entry_helper_not_kernel_annotated.pto |   32 +
 ...alize_uncovered_raw_vpto_vector_module_kind.pto |   33 +
 .../normalize_uncovered_tile_sections_mixed.pto    |   51 +
 ...ed_tile_sections_reject_residual_nested_mix.pto |   39 +
 ...vered_tile_sections_skip_kernel_kind_module.pto |   48 +
 .../normalize_uncovered_tile_sections_vector.pto   |   45 +
 test/lit/vpto/ptodsl_subkernel_backend_inline.pto  |   55 +
 test/lit/vpto/section_sugar_helper_funcs.pto       |   34 +
 ...elang_cross_file_inline_proc_backend_inline.pto |   36 +-
 .../vpto/tilelang_inline_proc_backend_inline.pto   |   35 +-
 .../lit/vpto/tilelang_soft_vmod_backend_inline.pto |  126 --
 test/lit/vpto/vmulscvt_vpto_llvm.pto               |    6 +-
 ...to_infer_module_kernel_kind_without_section.pto |   27 +
 .../lit/vpto/vpto_mainline_inline_proc_cleanup.pto |   34 +-
 .../vpto/vreg_low_precision_memory_vpto_llvm.pto   |   47 +-
 test/samples/TPushTPop/ptodsl/local_c2v/kernel.py  |   24 +-
 tools/ptoas/VPTOHostStubEmission.cpp               |    2 +-
 tools/ptoas/driver.cpp                             |  445 +++++-
 tools/ptoas/ptoas.cpp                              |  129 +-
 tools/ptoas/ptoas.h                                |    4 +
 115 files changed, 13144 insertions(+), 986 deletions(-)
===== END STAGE clone rc=0 @ 2026-06-18 00:56:31 =====

===== STAGE codex-review @ 2026-06-18 00:56:31 =====
set -euo pipefail
cd '/tmp/ptoas-pr-review-monitor/runs/20260618_005526_pr816/repo'
'codex' exec -C '/tmp/ptoas-pr-review-monitor/runs/20260618_005526_pr816/repo' -s read-only -c 'model_provider="codereview"' -c 'model="gpt-5.4"' -c 'model_reasoning_effort="xhigh"' --output-schema '/tmp/ptoas-pr-review-monitor/runs/20260618_005526_pr816/review_schema.json' -o '/tmp/ptoas-pr-review-monitor/runs/20260618_005526_pr816/codex_last_message.json' --color never - < '/tmp/ptoas-pr-review-monitor/runs/20260618_005526_pr816/review_prompt.txt'
OpenAI Codex v0.115.0 (research preview)
--------
workdir: /tmp/ptoas-pr-review-monitor/runs/20260618_005526_pr816/repo
model: gpt-5.4
provider: codereview
approval: never
sandbox: read-only
reasoning effort: xhigh
reasoning summaries: none
session id: 019ed683-563b-74a3-8537-e70ef1444c19
--------
user
你现在在审查 GitHub PR。

仓库：hw-native-sys/PTOAS
PR：#816 feat(ptodsl): add flash-attention demos and mixed-backend subkernel pipeline support
作者：Zhendong404
base branch：origin/main
head branch：HEAD（当前已 checkout 到 PR head）

要求：
1. 只审查这个 PR 相对 origin/main 的改动，必要时可以看上下文文件。
2. 重点找真实的 correctness / regression / contract mismatch / CI / runtime / compatibility 问题。
3. 不要提纯风格建议，不要提低价值猜测。
4. 严格按优先级输出：
   - P1：高概率会导致错误结果、编译/运行失败、严重回归、发布阻断
   - P2：重要缺陷、行为回归、遗漏校验/测试、较大兼容性问题
   - P3：次要但明确可改的问题
5. 如果没有问题，summary 直接写：未检查到 PR #816 存在问题，并返回 findings=[]。
6. 如果有问题，summary 简洁概括，findings 里每条都要给出：
   - severity
   - title
   - body（说明为什么是问题，尽量具体）
   - file（尽量给相对路径）
   - line（能确定就填整数，否则 null）

建议先查看：
- git status --short
- git diff --stat origin/main...HEAD
- git diff --unified=80 origin/main...HEAD

最终输出必须严格匹配 JSON schema。

mcp startup: no servers
Reconnecting... 1/5 (unexpected status 403 Forbidden: {"code":"GROUP_DISABLED","message":"API Key 所属分组已停用"}, url: https://codex.0u0o.com/responses, cf-ray: a0d39692098b4705-SJC, request id: 0feb4dd8-bafa-46a3-9191-34f75882836f)
Reconnecting... 2/5 (unexpected status 403 Forbidden: {"code":"GROUP_DISABLED","message":"API Key 所属分组已停用"}, url: https://codex.0u0o.com/responses, cf-ray: a0d39695caa7174b-SJC, request id: bcab9db2-28bd-4326-9d9b-eefe53b9df0d)
Reconnecting... 3/5 (unexpected status 403 Forbidden: {"code":"GROUP_DISABLED","message":"API Key 所属分组已停用"}, url: https://codex.0u0o.com/responses, cf-ray: a0d3969a5e592b74-LAX, request id: 2a56ab74-5e53-4d8b-b1e6-c0975387e0ce)
Reconnecting... 4/5 (unexpected status 403 Forbidden: {"code":"GROUP_DISABLED","message":"API Key 所属分组已停用"}, url: https://codex.0u0o.com/responses, cf-ray: a0d396a23b977db7-LAX, request id: e4ad58cf-74b3-4f23-907f-aabad21b3281)
Reconnecting... 5/5 (unexpected status 403 Forbidden: {"code":"GROUP_DISABLED","message":"API Key 所属分组已停用"}, url: https://codex.0u0o.com/responses, cf-ray: a0d396ae4d4dd7a4-LAX, request id: a20fa458-6b0e-4882-92b9-32c10fc966ae)
ERROR: unexpected status 403 Forbidden: {"code":"GROUP_DISABLED","message":"API Key 所属分组已停用"}, url: https://codex.0u0o.com/responses, cf-ray: a0d396c699b42b10-SJC, request id: 2d08f801-339c-4d6a-bdab-8b193680982f
Warning: no last agent message; wrote empty content to /tmp/ptoas-pr-review-monitor/runs/20260618_005526_pr816/codex_last_message.json
===== END STAGE codex-review rc=1 @ 2026-06-18 00:56:41 =====

Zhendong404 · 2026-06-15T12:55:18Z

/run a3

Zhendong404 · 2026-06-15T12:55:25Z

/run a5

reedhecre · 2026-06-15T12:55:43Z

已接收 /run a5，A5 板测器会处理这条请求。

进度页：http://154.9.227.233/ptoas-board-dashboard/#board-a5
当前状态：板测器空闲，这条请求会在本轮轮询启动。

页面会自动刷新，可以直接看当前阶段、排队情况和最近结果。

reedhecre · 2026-06-15T13:02:42Z

已接收 /run a3，A3 板测器会处理这条请求。

进度页：http://154.9.227.233/ptoas-board-dashboard/#board-a3
当前状态：板测器空闲，这条请求会在本轮轮询启动。

页面会自动刷新，可以直接看当前阶段、排队情况和最近结果。

reedhecre · 2026-06-15T13:13:48Z

A3 板测失败

触发方式：manual
源码提交：6fddfda4f624
结果汇总：OK 0 / FAIL 0 / SKIP 0
日志：/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260615_210251_manual_pr816.log
手动指令：/run a3
触发人：Zhendong404
触发评论：feat(ptodsl): add flash-attention demos and mixed-backend subkernel pipeline support #816 (comment)
失败阶段：sample-build-and-test / exit=1

日志尾部

al.py) SKIP requires --pto-arch=a5
Sync(test_intercore_sync_a5_ptoisa_vec.py) SKIP requires --pto-arch=a5
Sync(test_intercore_sync_a5.py) SKIP requires --pto-arch=a5
Sync(test_mem_inject_sync_basic.py) OK   generated: test_mem_inject_sync_basic-pto.cpp
Sync(test_set_wait_unified_api.py) OK   generated: test_set_wait_unified_api-pto.cpp
Sync(test_tmov_col_major_16x1_align_a5.pto) SKIP requires --pto-arch=a5
Sync(test_tmov_col_major_16x1_align_a5.py) SKIP requires --pto-arch=a5
Sync(test_tmov_row_major_1x16_control_a5.pto) SKIP requires --pto-arch=a5
Sync(test_tmov_row_major_1x16_control_a5.py) SKIP requires --pto-arch=a5
Sync(tmatmulk_autosync.py) OK   generated: tmatmulk_autosync-pto.cpp
TileSetGetValue(tile_getval_mat_invalid.py) XFAIL ptobc encode failed as expected
TileSetGetValue(tileSetGetValue.py) OK   generated: tileSetGetValue-pto.cpp
TInsert(tinsert_fp.py) OK   generated: tinsert_fp-pto.cpp
TInsert(tinsert.py) OK   generated: tinsert-pto.cpp
Tpows(tpows.py) OK   generated: tpows-pto.cpp
Tpow(tpow.py) OK   generated: tpow-pto.cpp
TPrefetchAsync(tprefetch_async_binding.py) OK   generated: tprefetch_async_binding-pto.cpp
TPrefetch(tprefetch.py) OK   generated: tprefetch-pto.cpp
Trans(trans.py) OK   generated: trans-pto.cpp
Trap(trap.py) OK   generated: trap-pto.cpp
TTri(ttri.py) OK   generated: ttri-pto.cpp
VectorAddition(vadd_pto_ir.py) OK   generated: vadd_pto_ir-pto.cpp
VectorAddition(vadd_validshape_hyper.py) OK   generated: vadd_validshape_hyper-pto.cpp
VectorAddition(vectorAddition.py) OK   generated: vectorAddition-pto.cpp
Xors(xors.py) OK   generated: xors-pto.cpp
Xor(xor.py)  OK   generated: xor-pto.cpp
-----------------------------
OK=233  FAIL=2  SKIP=39
=============================
===== END STAGE sample-build-and-test rc=1 @ 2026-06-15 21:13:53 =====

reedhecre · 2026-06-15T13:46:56Z

A5 板测失败

触发方式：manual
源码提交：6fddfda4f624
结果汇总：OK 157 / FAIL 66 / SKIP 1
日志：/root/ptoas-board-monitor-a5/logs/20260615_205606_manual_pr816.log
手动指令：/run a5
触发人：Zhendong404
触发评论：feat(ptodsl): add flash-attention demos and mixed-backend subkernel pipeline support #816 (comment)
失败阶段：board-validation / exit=1

失败用例

test_tmov_row_major_1x16_control_a5 (run, exit=2)
test_tmov_col_major_16x1_align_a5 (run, exit=2)
test_dynamic_valid_shape (run, exit=2)
test_barrier_sync (run, exit=2)
test_auto_sync_tail_hint (run, exit=2)
rmsnorm_incore_0 (run, exit=2)
rar_optimization_test (run, exit=2)
nested_loop_confliect (run, exit=2)
matmul (run, exit=2)
decode_projection_incore_0 (run, exit=2)
compensation_test (run, exit=2)
add_double_dynamic (run, exit=2)
rems (run, exit=2)
rem (run, exit=2)
rope_kv_cache (run, exit=2)
rmsnorm (run, exit=2)
qwen3_decode_incore_7 (run, exit=2)
qwen3_decode_incore_6 (run, exit=2)
qwen3_decode_incore_5 (run, exit=2)
qwen3_decode_incore_4 (run, exit=2)
qwen3_decode_incore_2 (run, exit=2)
qwen3_decode_incore_1 (run, exit=2)
qwen3_decode_incore_12 (run, exit=2)
qwen3_decode_incore_11 (run, exit=2)
qwen3_decode_incore_10 (run, exit=2)
post_rmsnorm (run, exit=2)
vector_example_dag_kernel_mul (run, exit=2)
vector_example_dag_kernel_add_scalar (run, exit=2)
vector_example_dag_kernel_add (run, exit=2)
paged_attention_example_kernel_softmax_prepare (run, exit=2)
paged_attention_example_kernel_qk_matmul (run, exit=2)
paged_attention_example_kernel_pv_matmul (run, exit=2)
paged_attention_example_kernel_online_update (run, exit=2)
paged_attention_example_kernel_init_inplace (run, exit=2)
orchestration_example_kernel_mul (run, exit=2)
orchestration_example_kernel_add_scalar (run, exit=2)
orchestration_example_kernel_add (run, exit=2)
prelu (run, exit=2)
plan_memory_reuse_sequential (run, exit=2)
plan_memory_peak_exact_capacity (run, exit=2)
plan_memory_peak_8_overlapping (run, exit=2)
plan_memory_no_reuse_overlap (run, exit=2)
plan_memory_nested_loops (run, exit=2)
plan_memory_loop_no_reuse_outer_live (run, exit=2)
plan_memory_loop_in_if (run, exit=2)
plan_memory_if_yield (run, exit=2)
plan_memory_if_in_loop (run, exit=2)
plan_memory_fragmentation_two_holes (run, exit=2)
plan_memory_fragmentation_hole_fit (run, exit=2)
plan_memory_for_iter_args_yield (run, exit=2)
plan_memory_bind_tile_alias_liveness (run, exit=2)
partition_view_verify_valid (run, exit=2)
partition_view_verify_rank_mismatch_valid (run, exit=2)
partition5d_dynamic_a5 (run, exit=2)
partition5d_a5 (run, exit=2)
tensor_view_layout_dn (run, exit=2)
sparse_attn_test_incore_7 (run, exit=2)
decode_swa_test_incore_40 (run, exit=2)
decode_hca_test_incore_54 (run, exit=2)
decode_csa_test_incore_81 (run, exit=2)
attention_swa_test_incore_40 (run, exit=2)
attention_hca_test_incore_54 (run, exit=2)
attention_csa_test_refresh_incore_81 (run, exit=2)
tbroadcast_root_binding (run, exit=139)
cmps (run, exit=2)
cmp (run, exit=2)

reedhecre · 2026-06-15T13:46:58Z

A5 板测失败详情：PR #816

test_tmov_row_major_1x16_control_a5

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libtest_tmov_row_major_1x16_control_a5_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/test_tmov_row_major_1x16_control_a5.dir/build.make:98: test_tmov_row_major_1x16_control_a5] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/test_tmov_row_major_1x16_control_a5.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:07:37] ERROR: testcase failed (exit 2): test_tmov_row_major_1x16_control_a5

test_tmov_col_major_16x1_align_a5

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libtest_tmov_col_major_16x1_align_a5_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/test_tmov_col_major_16x1_align_a5.dir/build.make:98: test_tmov_col_major_16x1_align_a5] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/test_tmov_col_major_16x1_align_a5.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:07:40] ERROR: testcase failed (exit 2): test_tmov_col_major_16x1_align_a5

test_dynamic_valid_shape

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libtest_dynamic_valid_shape_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/test_dynamic_valid_shape.dir/build.make:98: test_dynamic_valid_shape] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/test_dynamic_valid_shape.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:11:25] ERROR: testcase failed (exit 2): test_dynamic_valid_shape

test_barrier_sync

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libtest_barrier_sync_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/test_barrier_sync.dir/build.make:98: test_barrier_sync] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/test_barrier_sync.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:11:27] ERROR: testcase failed (exit 2): test_barrier_sync

test_auto_sync_tail_hint

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libtest_auto_sync_tail_hint_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/test_auto_sync_tail_hint.dir/build.make:98: test_auto_sync_tail_hint] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/test_auto_sync_tail_hint.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:11:30] ERROR: testcase failed (exit 2): test_auto_sync_tail_hint

rmsnorm_incore_0

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by librmsnorm_incore_0_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/rmsnorm_incore_0.dir/build.make:98: rmsnorm_incore_0] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/rmsnorm_incore_0.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:12:25] ERROR: testcase failed (exit 2): rmsnorm_incore_0

rar_optimization_test

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by librar_optimization_test_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/rar_optimization_test.dir/build.make:98: rar_optimization_test] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/rar_optimization_test.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:12:28] ERROR: testcase failed (exit 2): rar_optimization_test

nested_loop_confliect

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libnested_loop_confliect_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/nested_loop_confliect.dir/build.make:98: nested_loop_confliect] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/nested_loop_confliect.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:12:31] ERROR: testcase failed (exit 2): nested_loop_confliect

matmul

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libmatmul_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/matmul.dir/build.make:98: matmul] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/matmul.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:12:33] ERROR: testcase failed (exit 2): matmul

decode_projection_incore_0

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libdecode_projection_incore_0_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/decode_projection_incore_0.dir/build.make:98: decode_projection_incore_0] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/decode_projection_incore_0.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:12:36] ERROR: testcase failed (exit 2): decode_projection_incore_0

compensation_test

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libcompensation_test_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/compensation_test.dir/build.make:98: compensation_test] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/compensation_test.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:12:38] ERROR: testcase failed (exit 2): compensation_test

add_double_dynamic

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libadd_double_dynamic_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/add_double_dynamic.dir/build.make:98: add_double_dynamic] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/add_double_dynamic.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:12:41] ERROR: testcase failed (exit 2): add_double_dynamic

rems

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by librems_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/rems.dir/build.make:98: rems] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/rems.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:21:57] ERROR: testcase failed (exit 2): rems

rem

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by librem_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/rem.dir/build.make:98: rem] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/rem.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:21:59] ERROR: testcase failed (exit 2): rem

rope_kv_cache

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by librope_kv_cache_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/rope_kv_cache.dir/build.make:98: rope_kv_cache] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/rope_kv_cache.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:29] ERROR: testcase failed (exit 2): rope_kv_cache

rmsnorm

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by librmsnorm_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/rmsnorm.dir/build.make:98: rmsnorm] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/rmsnorm.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:32] ERROR: testcase failed (exit 2): rmsnorm

qwen3_decode_incore_7

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libqwen3_decode_incore_7_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_7.dir/build.make:98: qwen3_decode_incore_7] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/qwen3_decode_incore_7.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:35] ERROR: testcase failed (exit 2): qwen3_decode_incore_7

qwen3_decode_incore_6

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libqwen3_decode_incore_6_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_6.dir/build.make:98: qwen3_decode_incore_6] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/qwen3_decode_incore_6.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:38] ERROR: testcase failed (exit 2): qwen3_decode_incore_6

qwen3_decode_incore_5

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libqwen3_decode_incore_5_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_5.dir/build.make:98: qwen3_decode_incore_5] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/qwen3_decode_incore_5.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:40] ERROR: testcase failed (exit 2): qwen3_decode_incore_5

qwen3_decode_incore_4

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libqwen3_decode_incore_4_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_4.dir/build.make:98: qwen3_decode_incore_4] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/qwen3_decode_incore_4.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:43] ERROR: testcase failed (exit 2): qwen3_decode_incore_4

qwen3_decode_incore_2

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libqwen3_decode_incore_2_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_2.dir/build.make:98: qwen3_decode_incore_2] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/qwen3_decode_incore_2.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:46] ERROR: testcase failed (exit 2): qwen3_decode_incore_2

qwen3_decode_incore_1

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libqwen3_decode_incore_1_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_1.dir/build.make:98: qwen3_decode_incore_1] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/qwen3_decode_incore_1.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:48] ERROR: testcase failed (exit 2): qwen3_decode_incore_1

qwen3_decode_incore_12

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libqwen3_decode_incore_12_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_12.dir/build.make:98: qwen3_decode_incore_12] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/qwen3_decode_incore_12.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:51] ERROR: testcase failed (exit 2): qwen3_decode_incore_12

qwen3_decode_incore_11

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libqwen3_decode_incore_11_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_11.dir/build.make:98: qwen3_decode_incore_11] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/qwen3_decode_incore_11.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:53] ERROR: testcase failed (exit 2): qwen3_decode_incore_11

qwen3_decode_incore_10

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libqwen3_decode_incore_10_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_10.dir/build.make:98: qwen3_decode_incore_10] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/qwen3_decode_incore_10.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:56] ERROR: testcase failed (exit 2): qwen3_decode_incore_10

post_rmsnorm

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libpost_rmsnorm_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/post_rmsnorm.dir/build.make:98: post_rmsnorm] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/post_rmsnorm.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:59] ERROR: testcase failed (exit 2): post_rmsnorm

vector_example_dag_kernel_mul

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libvector_example_dag_kernel_mul_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/vector_example_dag_kernel_mul.dir/build.make:98: vector_example_dag_kernel_mul] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/vector_example_dag_kernel_mul.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:23:57] ERROR: testcase failed (exit 2): vector_example_dag_kernel_mul

vector_example_dag_kernel_add_scalar

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libvector_example_dag_kernel_add_scalar_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/vector_example_dag_kernel_add_scalar.dir/build.make:98: vector_example_dag_kernel_add_scalar] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/vector_example_dag_kernel_add_scalar.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:23:59] ERROR: testcase failed (exit 2): vector_example_dag_kernel_add_scalar

vector_example_dag_kernel_add

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libvector_example_dag_kernel_add_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/vector_example_dag_kernel_add.dir/build.make:98: vector_example_dag_kernel_add] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/vector_example_dag_kernel_add.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:02] ERROR: testcase failed (exit 2): vector_example_dag_kernel_add

paged_attention_example_kernel_softmax_prepare

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libpaged_attention_example_kernel_softmax_prepare_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/paged_attention_example_kernel_softmax_prepare.dir/build.make:98: paged_attention_example_kernel_softmax_prepare] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/paged_attention_example_kernel_softmax_prepare.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:04] ERROR: testcase failed (exit 2): paged_attention_example_kernel_softmax_prepare

paged_attention_example_kernel_qk_matmul

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libpaged_attention_example_kernel_qk_matmul_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/paged_attention_example_kernel_qk_matmul.dir/build.make:98: paged_attention_example_kernel_qk_matmul] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/paged_attention_example_kernel_qk_matmul.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:07] ERROR: testcase failed (exit 2): paged_attention_example_kernel_qk_matmul

paged_attention_example_kernel_pv_matmul

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libpaged_attention_example_kernel_pv_matmul_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/paged_attention_example_kernel_pv_matmul.dir/build.make:98: paged_attention_example_kernel_pv_matmul] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/paged_attention_example_kernel_pv_matmul.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:09] ERROR: testcase failed (exit 2): paged_attention_example_kernel_pv_matmul

paged_attention_example_kernel_online_update

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libpaged_attention_example_kernel_online_update_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/paged_attention_example_kernel_online_update.dir/build.make:98: paged_attention_example_kernel_online_update] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/paged_attention_example_kernel_online_update.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:12] ERROR: testcase failed (exit 2): paged_attention_example_kernel_online_update

paged_attention_example_kernel_init_inplace

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libpaged_attention_example_kernel_init_inplace_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/paged_attention_example_kernel_init_inplace.dir/build.make:98: paged_attention_example_kernel_init_inplace] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/paged_attention_example_kernel_init_inplace.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:15] ERROR: testcase failed (exit 2): paged_attention_example_kernel_init_inplace

orchestration_example_kernel_mul

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by liborchestration_example_kernel_mul_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/orchestration_example_kernel_mul.dir/build.make:98: orchestration_example_kernel_mul] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/orchestration_example_kernel_mul.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:17] ERROR: testcase failed (exit 2): orchestration_example_kernel_mul

orchestration_example_kernel_add_scalar

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by liborchestration_example_kernel_add_scalar_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/orchestration_example_kernel_add_scalar.dir/build.make:98: orchestration_example_kernel_add_scalar] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/orchestration_example_kernel_add_scalar.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:20] ERROR: testcase failed (exit 2): orchestration_example_kernel_add_scalar

orchestration_example_kernel_add

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by liborchestration_example_kernel_add_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/orchestration_example_kernel_add.dir/build.make:98: orchestration_example_kernel_add] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/orchestration_example_kernel_add.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:22] ERROR: testcase failed (exit 2): orchestration_example_kernel_add

prelu

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libprelu_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/prelu.dir/build.make:98: prelu] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/prelu.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:44] ERROR: testcase failed (exit 2): prelu

plan_memory_reuse_sequential

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_reuse_sequential_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_reuse_sequential.dir/build.make:98: plan_memory_reuse_sequential] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_reuse_sequential.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:47] ERROR: testcase failed (exit 2): plan_memory_reuse_sequential

plan_memory_peak_exact_capacity

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_peak_exact_capacity_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_peak_exact_capacity.dir/build.make:98: plan_memory_peak_exact_capacity] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_peak_exact_capacity.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:50] ERROR: testcase failed (exit 2): plan_memory_peak_exact_capacity

plan_memory_peak_8_overlapping

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_peak_8_overlapping_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_peak_8_overlapping.dir/build.make:98: plan_memory_peak_8_overlapping] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_peak_8_overlapping.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:52] ERROR: testcase failed (exit 2): plan_memory_peak_8_overlapping

plan_memory_no_reuse_overlap

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_no_reuse_overlap_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_no_reuse_overlap.dir/build.make:98: plan_memory_no_reuse_overlap] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_no_reuse_overlap.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:55] ERROR: testcase failed (exit 2): plan_memory_no_reuse_overlap

plan_memory_nested_loops

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_nested_loops_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_nested_loops.dir/build.make:98: plan_memory_nested_loops] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_nested_loops.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:57] ERROR: testcase failed (exit 2): plan_memory_nested_loops

plan_memory_loop_no_reuse_outer_live

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_loop_no_reuse_outer_live_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_loop_no_reuse_outer_live.dir/build.make:98: plan_memory_loop_no_reuse_outer_live] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_loop_no_reuse_outer_live.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:25:00] ERROR: testcase failed (exit 2): plan_memory_loop_no_reuse_outer_live

plan_memory_loop_in_if

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_loop_in_if_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_loop_in_if.dir/build.make:98: plan_memory_loop_in_if] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_loop_in_if.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:25:03] ERROR: testcase failed (exit 2): plan_memory_loop_in_if

plan_memory_if_yield

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_if_yield_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_if_yield.dir/build.make:98: plan_memory_if_yield] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_if_yield.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:25:05] ERROR: testcase failed (exit 2): plan_memory_if_yield

plan_memory_if_in_loop

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_if_in_loop_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_if_in_loop.dir/build.make:98: plan_memory_if_in_loop] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_if_in_loop.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:25:08] ERROR: testcase failed (exit 2): plan_memory_if_in_loop

plan_memory_fragmentation_two_holes

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_fragmentation_two_holes_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_fragmentation_two_holes.dir/build.make:98: plan_memory_fragmentation_two_holes] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_fragmentation_two_holes.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:25:10] ERROR: testcase failed (exit 2): plan_memory_fragmentation_two_holes

plan_memory_fragmentation_hole_fit

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_fragmentation_hole_fit_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_fragmentation_hole_fit.dir/build.make:98: plan_memory_fragmentation_hole_fit] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_fragmentation_hole_fit.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:25:13] ERROR: testcase failed (exit 2): plan_memory_fragmentation_hole_fit

plan_memory_for_iter_args_yield

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_for_iter_args_yield_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_for_iter_args_yield.dir/build.make:98: plan_memory_for_iter_args_yield] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_for_iter_args_yield.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:25:15] ERROR: testcase failed (exit 2): plan_memory_for_iter_args_yield

plan_memory_bind_tile_alias_liveness

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_bind_tile_alias_liveness_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_bind_tile_alias_liveness.dir/build.make:98: plan_memory_bind_tile_alias_liveness] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_bind_tile_alias_liveness.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:25:18] ERROR: testcase failed (exit 2): plan_memory_bind_tile_alias_liveness

partition_view_verify_valid

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libpartition_view_verify_valid_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/partition_view_verify_valid.dir/build.make:98: partition_view_verify_valid] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/partition_view_verify_valid.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:26:10] ERROR: testcase failed (exit 2): partition_view_verify_valid

partition_view_verify_rank_mismatch_valid

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libpartition_view_verify_rank_mismatch_valid_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/partition_view_verify_rank_mismatch_valid.dir/build.make:98: partition_view_verify_rank_mismatch_valid] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/partition_view_verify_rank_mismatch_valid.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:26:13] ERROR: testcase failed (exit 2): partition_view_verify_rank_mismatch_valid

partition5d_dynamic_a5

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libpartition5d_dynamic_a5_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/partition5d_dynamic_a5.dir/build.make:98: partition5d_dynamic_a5] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/partition5d_dynamic_a5.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:26:15] ERROR: testcase failed (exit 2): partition5d_dynamic_a5

partition5d_a5

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libpartition5d_a5_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/partition5d_a5.dir/build.make:98: partition5d_a5] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/partition5d_a5.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:26:18] ERROR: testcase failed (exit 2): partition5d_a5

tensor_view_layout_dn

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libtensor_view_layout_dn_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/tensor_view_layout_dn.dir/build.make:98: tensor_view_layout_dn] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/tensor_view_layout_dn.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:31:23] ERROR: testcase failed (exit 2): tensor_view_layout_dn

sparse_attn_test_incore_7

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libsparse_attn_test_incore_7_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/sparse_attn_test_incore_7.dir/build.make:98: sparse_attn_test_incore_7] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/sparse_attn_test_incore_7.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:35:52] ERROR: testcase failed (exit 2): sparse_attn_test_incore_7

decode_swa_test_incore_40

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libdecode_swa_test_incore_40_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/decode_swa_test_incore_40.dir/build.make:98: decode_swa_test_incore_40] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/decode_swa_test_incore_40.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:35:55] ERROR: testcase failed (exit 2): decode_swa_test_incore_40

decode_hca_test_incore_54

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libdecode_hca_test_incore_54_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/decode_hca_test_incore_54.dir/build.make:98: decode_hca_test_incore_54] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/decode_hca_test_incore_54.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:35:57] ERROR: testcase failed (exit 2): decode_hca_test_incore_54

decode_csa_test_incore_81

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libdecode_csa_test_incore_81_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/decode_csa_test_incore_81.dir/build.make:98: decode_csa_test_incore_81] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/decode_csa_test_incore_81.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:36:00] ERROR: testcase failed (exit 2): decode_csa_test_incore_81

attention_swa_test_incore_40

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libattention_swa_test_incore_40_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/attention_swa_test_incore_40.dir/build.make:98: attention_swa_test_incore_40] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/attention_swa_test_incore_40.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:36:03] ERROR: testcase failed (exit 2): attention_swa_test_incore_40

attention_hca_test_incore_54

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libattention_hca_test_incore_54_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/attention_hca_test_incore_54.dir/build.make:98: attention_hca_test_incore_54] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/attention_hca_test_incore_54.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:36:05] ERROR: testcase failed (exit 2): attention_hca_test_incore_54

attention_csa_test_refresh_incore_81

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libattention_csa_test_refresh_incore_81_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/attention_csa_test_refresh_incore_81.dir/build.make:98: attention_csa_test_refresh_incore_81] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/attention_csa_test_refresh_incore_81.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:36:08] ERROR: testcase failed (exit 2): attention_csa_test_refresh_incore_81

tbroadcast_root_binding

stage=run info=exit=139

./test/npu_validation/scripts/run_remote_npu_validation.sh: line 380: 55771 Segmentation fault      (core dumped) LD_LIBRARY_PATH="${LD_LIBRARY_PATH_NPU}" ./build/${testcase}
[2026-06-15 21:39:14] ERROR: testcase failed (exit 139): tbroadcast_root_binding

cmps

stage=run info=exit=2

[ERROR] Packed mask mismatch: golden_v2.bin vs v2.bin, idx=4 (golden=98, out=0)
[ERROR] compare failed
[2026-06-15 21:43:51] ERROR: testcase failed (exit 2): cmps

cmp

stage=run info=exit=2

[ERROR] Packed mask mismatch: golden_v3.bin vs v3.bin, idx=4 (golden=49, out=0)
[ERROR] compare failed
[2026-06-15 21:44:02] ERROR: testcase failed (exit 2): cmp

Zhendong404 · 2026-06-17T06:13:18Z

/run a3

reedhecre · 2026-06-17T06:15:47Z

已接收 /run a3，A3 板测器会处理这条请求。

进度页：http://154.9.227.233/ptoas-board-dashboard/#board-a3
当前状态：板测器空闲，这条请求会在本轮轮询启动。

页面会自动刷新，可以直接看当前阶段、排队情况和最近结果。

Zhendong404 · 2026-06-17T06:48:35Z

/run a5

reedhecre · 2026-06-17T07:14:37Z

已接收 /run a5，A5 板测器会处理这条请求。

进度页：http://154.9.227.233/ptoas-board-dashboard/#board-a5
当前状态：板测器空闲，这条请求会在本轮轮询启动。

页面会自动刷新，可以直接看当前阶段、排队情况和最近结果。

reedhecre · 2026-06-17T07:20:40Z

A5 板测失败

触发方式：manual
源码提交：9de7d70b413b
结果汇总：OK 0 / FAIL 0 / SKIP 0
日志：/root/ptoas-board-monitor-a5/logs/20260617_151507_manual_pr816.log
手动指令：/run a5
触发人：Zhendong404
触发评论：feat(ptodsl): add flash-attention demos and mixed-backend subkernel pipeline support #816 (comment)
失败阶段：sample-build-and-test / exit=1

日志尾部

ntercore_sync_a5.py) FAIL python failed: test_intercore_sync_a5.py
Sync(test_mem_inject_sync_basic.py) FAIL python failed: test_mem_inject_sync_basic.py
Sync(test_set_wait_unified_api.py) FAIL python failed: test_set_wait_unified_api.py
Sync(test_tmov_col_major_16x1_align_a5.pto) OK   generated: test_tmov_col_major_16x1_align_a5.cpp
Sync(test_tmov_col_major_16x1_align_a5.py) OK   generated: test_tmov_col_major_16x1_align_a5-pto.cpp
Sync(test_tmov_row_major_1x16_control_a5.pto) OK   generated: test_tmov_row_major_1x16_control_a5.cpp
Sync(test_tmov_row_major_1x16_control_a5.py) OK   generated: test_tmov_row_major_1x16_control_a5-pto.cpp
Sync(tmatmulk_autosync_a5.py) FAIL python failed: tmatmulk_autosync_a5.py
TileSetGetValue(tile_getval_mat_invalid.py) XFAIL python failed as expected
TileSetGetValue(tileSetGetValue.py) FAIL python failed: tileSetGetValue.py
TInsert(tinsert_fp.py) FAIL python failed: tinsert_fp.py
TInsert(tinsert.py) FAIL python failed: tinsert.py
Tpows(tpows.py) FAIL python failed: tpows.py
Tpow(tpow.py) FAIL python failed: tpow.py
TPrefetchAsync(tprefetch_async_binding.py) FAIL python failed: tprefetch_async_binding.py
TPrefetch(tprefetch.py) FAIL python failed: tprefetch.py
Trans(trans.py) FAIL python failed: trans.py
Trap(trap.py) FAIL python failed: trap.py
TTri(ttri.py) FAIL python failed: ttri.py
VectorAddition(vadd_pto_ir.py) FAIL python failed: vadd_pto_ir.py
VectorAddition(vadd_validshape_hyper.py) FAIL python failed: vadd_validshape_hyper.py
VectorAddition(vectorAddition.py) FAIL python failed: vectorAddition.py
Xors(xors.py) FAIL python failed: xors.py
Xor(xor.py)  FAIL python failed: xor.py
-----------------------------
OK=80  FAIL=162  SKIP=28
=============================
===== END STAGE sample-build-and-test rc=1 @ 2026-06-17 15:21:07 =====

reedhecre · 2026-06-17T07:55:13Z

A3 板测失败

触发方式：manual
源码提交：9de7d70b413b
结果汇总：OK 0 / FAIL 0 / SKIP 0
日志：/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260617_141556_manual_pr816.log
手动指令：/run a3
触发人：Zhendong404
触发评论：feat(ptodsl): add flash-attention demos and mixed-backend subkernel pipeline support #816 (comment)
失败阶段：board-validation / exit=143

日志尾部

st/npu_validation/scripts/generate_testcase.py
pto-isa vendor cache hit: repo=https://gitcode.com/cann/pto-isa.git requested_commit=b65945bf2aa7b0020b9ae813c4d60f18cf0e538f actual_commit=b65945bf2aa7b0020b9ae813c4d60f18cf0e538f

===== STAGE board-validation @ 2026-06-17 14:31:23 =====
task-submit cwd=/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260617_141556_manual_pr816/payload
task-submit env-file=/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260617_141556_manual_pr816/board-validation.env
task-submit run-script:
set -euo pipefail
cd /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260617_141556_manual_pr816/payload
export DEVICE_ID=${TASK_DEVICE:-auto}
bash ./test/npu_validation/scripts/run_remote_npu_validation.sh
task-submit wrapped-command: bash -lc "set -euo pipefail; cd /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260617_141556_manual_pr816/payload; export DEVICE_ID=${TASK_DEVICE:-auto}; bash ./test/npu_validation/scripts/run_remote_npu_validation.sh"
task-submit submit-cmd: /usr/local/bin/task-submit --device auto --max-time 0 --env-file /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260617_141556_manual_pr816/board-validation.env 'bash -lc "set -euo pipefail; cd /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260617_141556_manual_pr816/payload; export DEVICE_ID=${TASK_DEVICE:-auto}; bash ./test/npu_validation/scripts/run_remote_npu_validation.sh"'
task_20260617_143123_21661425699
提示: task-submit --help 查看卡分配/日志等机制与用法
task-submit task-id: task_20260617_143123_21661425699
等待任务执行: task_20260617_143123_21661425699 (Ctrl+C 终止任务)
[npu-lock] 获取设备 9 的锁 (无超时)...

Session terminated, killing shell... ...killed.
=== 任务已终止 (killed) ===
task-submit wait rc=143
completed (exit=143)
===== END STAGE board-validation rc=143 @ 2026-06-17 15:55:19 =====

… output LLVM IR format

zhangstevenunity

Manual deep review (Codex bot failed on this PR)

Heads-up: the automated codex-review on this PR failed to run (exit 1 -- the upstream review API key group was disabled, HTTP 403), so there are currently no machine-generated findings. I did a manual correctness pass over the genuinely new/changed C++ and the mixed-backend flow, diffed against the true PR base (e37d41e1).

Net: the change is largely additive and the existing single-backend path looks preserved -- I did not find a release-blocking miscompile in the default path. Three concrete items below (1x P2, 2x P3), posted inline.

Inline findings

P2 PTONormalizeUncoveredTileSections.cpp -- inferSegmentKind ignores ambiguousTileOps, so an unclassifiable tile op (e.g. pto.tpush/pto.tpop) co-resident with a classifiable op in an uncovered segment is silently wrapped into an inferred section instead of routing to emitSegmentInferenceError (asymmetric with inferWholeFunctionKind).
P3 driver.cpp -- isBackendPartitionedContainer is a vacuous return true; in mixed mode stray top-level ops would be silently dropped.
P3 PTO.cpp -- lookupPeerFuncAcrossContainer''s empty-fallback resolves private siblings, contradicting the visibility filter a few lines above and the driver''s public-only contract.

Non-blocking observations (not posted inline)

Explicit entry detection. Entries now require pto.entry/pto.kernel/hacc.entry/pto.aicore. This is the documented intent and in-repo consumers are migrated, but a module that reaches codegen with function definitions and zero recognized entries emits device code with no host stub and no diagnostic. A guarded warning would surface silent non-launch -- but note zero-entry child modules are legitimately valid in mixed-backend mode, so any such check must be scoped to callee-only children to avoid false positives.
Subkernel-call autosync pipe assumption. ptodsl_subkernel_call_autosync models the call with a single representative pipe (PIPE_V for simd/simt, PIPE_M for cube), and pto-inline-backend-helpers runs after InsertSync without re-running it. That boundary sync is correct only if the helper is pipe-homogeneous w.r.t. each tile argument (first/last effect per operand is on the representative pipe). If multi-pipe subkernel helpers are possible, an assert or a per-operand pipe derivation would close the gap; the test only exercises a PIPE_V-only helper.
Minor: the debugIROutputRequested list in resolveSingleBackend omits emitVPTOLLVMDialect (present in the other two gates). Harmless today (buildBackendInfo backstops it), but inconsistent.

Spot-checked and looks correct

EmitC PTOAS__TILE_DATA sink-after-TASSIGN (only ever moves a pure .data() read later; per-use re-materialization is dominance-safe), public-helper extern "C" linkage (matches the pre-existing declarations), generalized subview/reinterpret_cast pointer typing (no offset truncation, int64 offsets), vmulscvt lowering (llvm.hivm.vmulscvt.v128f16, operand order matches the masked+part sibling pattern), attachHIVMKernelAnnotations non-entry exclusion, and the GraphSyncSolver pto.section.* transparency change (strictly additive -- section bodies were previously not translated at all; set/wait flags still land inside the section).

zhangstevenunity · 2026-06-22T02:52:51Z

+}
+
+static std::optional<InferredSectionKind>
+inferSegmentKind(const UncoveredTopLevelSegment &segment) {


P2 -- inferSegmentKind ignores ambiguousTileOps; unclassifiable tile ops are silently absorbed into an inferred section.

This decides a segment''s kind purely from vectorTileOpCount/cubeTileOpCount and never inspects segment.ambiguousTileOps. That is asymmetric with the whole-function analogue inferWholeFunctionKind, which bails when summary.ambiguousOps is non-empty. The collection side is symmetric -- inspectSegmentOperation pushes unclassifiable tile-like ops into segment.ambiguousTileOps, and emitSegmentInferenceError is even written to report them -- but because inferSegmentKind never returns nullopt on their account, that error branch is unreachable whenever a classifiable op coexists in the same segment.

Consequence: a top-level container op (e.g. scf.for) whose body holds a classifiable tile op of one kind plus an unclassifiable tile op is inferred as that one kind, and normalizeFunction -> wrapUncoveredTopLevelSegment moves the unclassifiable op into that pto.section.* with no diagnostic. After wrapping it counts as "covered" (the residual verifier does not descend into explicit sections), so the misplacement is silent.

pto.tpush/pto.tpop are concrete triggers: they are isTileLikeOp (OpPipeInterface + pto.t prefix) but classifyTileOp returns nullopt -- their $tile is a PTOPipeEntryType, so getPipe() yields PIPE_UNASSIGNED and there are no buffer-typed operands to classify by address space. A loop body doing VEC compute plus a pto.tpush, left uncovered, would be wrapped as section.vector with the push silently absorbed; if its correct ownership were the cube side, that is a wrong-core placement.

Fix (one line, mirroring the function-level guard):

static std::optional<InferredSectionKind> inferSegmentKind(const UncoveredTopLevelSegment &segment) { if (!segment.ambiguousTileOps.empty()) return std::nullopt; // route to emitSegmentInferenceError if (segment.vectorTileOpCount && segment.cubeTileOpCount) return std::nullopt; ...

No lit test currently covers an unclassifiable tile op co-resident with a classifiable one inside an uncovered segment.

zhangstevenunity · 2026-06-22T02:52:51Z

-                                                      ModuleOp child) {
-  for (NamedAttribute attr : outer->getAttrs()) {
+static bool isBackendPartitionedContainer(ModuleOp module) {
+  return llvm::all_of(module.getOps<ModuleOp>(),


P3 -- isBackendPartitionedContainer validates nothing (always true).

static bool isBackendPartitionedContainer(ModuleOp module) { return llvm::all_of(module.getOps<ModuleOp>(), [](ModuleOp) { return true; }); }

getOps<ModuleOp>() only yields the child modules and the predicate returns true for each, so this is vacuously true for any module (including one with zero children). It never checks that the outer module contains only child modules. It gates mixed-backend mode in resolveSingleBackend (children.size() > 1 && isBackendPartitionedContainer(module)), and collectChildJobs builds jobs solely from module.getOps<ModuleOp>() -- so any stray top-level op in the outer container (e.g. a func.func directly under it) is silently dropped from the output rather than rejected. Canonical PTODSL/doc IR only nests child modules, so this is latent, but the predicate should enforce the invariant it is named for, e.g.:

Block *body = module.getBody(); return !body->empty() && llvm::all_of(body->getOperations(), [](Operation &op) { return isa<ModuleOp>(op); });

and the driver should emit a diagnostic (not silently drop) when an outer container in mixed mode holds non-module top-level ops.

zhangstevenunity · 2026-06-22T02:52:51Z

+
+  if (fallbackMatches.size() == 1)
+    return fallbackMatches.front();
+  if (fallbackMatches.empty()) {


P3 -- cross-child peer fallback resolves private siblings, contradicting the visibility filter a few lines above.

The outerModule.walk just above (the fallbackMatches collection) deliberately skips functions whose sym_visibility is "private", mirroring the driver''s findSiblingSourceFunction, which only accepts public sibling defs. But this empty-fallbackMatches fallback uses SymbolTable::lookupSymbolIn(childModule, target), which returns a symbol regardless of visibility. So when no public peer exists anywhere but a sibling child module has a matching private func.func, ImportReservedBufferOp::verify accepts it, whereas the driver''s mixed-backend assembly later rejects private peers ("unresolved cross-child peer_func reference"). Net effect: the op verifies clean, then fails later in the driver with a less localized message (no wrong code is emitted). The fallback also matches only by raw symbol name, not by pto.ptodsl.logical_name, unlike the walk above. Suggest skipping sym_visibility == "private" funcs here (matching the walk and the driver), or dropping the fallback since a public match would already have been found by the recursive walk.

mouliangyu · 2026-06-22T02:19:24Z

+}
+
+static FailureOr<func::FuncOp>
+findSiblingSourceFunction(ModuleOp outer, ModuleOp targetChild,


为啥需要 peer func 来着，我有点忘记了，麻烦评论或者注释补充解释下

gemini-code-assist Bot reviewed Jun 14, 2026

View reviewed changes

Zhendong404 force-pushed the fa-demo-restack branch 4 times, most recently from 4ab1ce5 to 912e855 Compare June 15, 2026 12:54

Zhendong404 force-pushed the fa-demo-restack branch 6 times, most recently from 26c8e46 to 356a943 Compare June 17, 2026 04:38

Zhendong404 added 5 commits June 18, 2026 00:50

feat(ptodsl): support mix-kernel and mix-backend hybrid compile

b483793

feat(insert-sync): support insert sync for subkernels

a5a9355

fix(emitc): sink PTOAS__TILE_DATA reads to use sites after TASSIGN

b7b2fd7

fix(insert sync): ignore function declaration in insert sync pass

a3cdf81

fix(emitc): rewrite malformed verbatim semicolons as a fix to llvm bugs

a191f73

Zhendong404 added 8 commits June 18, 2026 00:50

feat(ptodsl): add fa demos in PTODSL

f508507

fix(ptodsl): support ast rewrite for non-entry kernel

120ff14

fix(gss): let graph sync solver recognize pto.section.* region

0e6211e

fix(vpto): support vmulscvt in CANN900 emiter

b8cfacb

fix(ptodsl): fix a bug of ptodsl test framework

a4e4c6c

Fix testcases

859fb99

feat(driver): add an --cann-output-version option to force select the…

249c018

… output LLVM IR format

fix(vpto): fix reinterpret_cast leak in the vpto backend

68069a0

Zhendong404 force-pushed the fa-demo-restack branch from 356a943 to 68069a0 Compare June 17, 2026 16:50

zhangstevenunity reviewed Jun 22, 2026

View reviewed changes

mouliangyu reviewed Jun 22, 2026

View reviewed changes

		std::string qualifier =
		addrSpaceQualifier(type.getMemorySpace().getAddressSpace());

-      std::string qualifier =
-          addrSpaceQualifier(type.getMemorySpace().getAddressSpace());
+      std::string qualifier = "__gm__";
+      if (auto ms = type.getMemorySpace()) {
+        if (auto ptoAttr = dyn_cast<pto::AddressSpaceAttr>(ms))
+          qualifier = addrSpaceQualifier(ptoAttr.getAddressSpace());
+      }

Conversation

Zhendong404 commented Jun 14, 2026

Summary

Main Changes

Scope

Validation

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

reedhecre commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codex Review

Summary

Findings

Log Tail

Uh oh!

Zhendong404 commented Jun 15, 2026

Uh oh!

Zhendong404 commented Jun 15, 2026

Uh oh!

reedhecre commented Jun 15, 2026

Uh oh!

reedhecre commented Jun 15, 2026

Uh oh!

reedhecre commented Jun 15, 2026

A3 板测失败

日志尾部

Uh oh!

reedhecre commented Jun 15, 2026

A5 板测失败

失败用例

Uh oh!

reedhecre commented Jun 15, 2026

A5 板测失败详情：PR #816

Uh oh!

Zhendong404 commented Jun 17, 2026

Uh oh!

reedhecre commented Jun 17, 2026

Uh oh!

Zhendong404 commented Jun 17, 2026

Uh oh!

reedhecre commented Jun 17, 2026

Uh oh!

reedhecre commented Jun 17, 2026

A5 板测失败

日志尾部

Uh oh!

reedhecre commented Jun 17, 2026

A3 板测失败

日志尾部

Uh oh!

zhangstevenunity left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Manual deep review (Codex bot failed on this PR)

Inline findings

Non-blocking observations (not posted inline)

Spot-checked and looks correct

Uh oh!

zhangstevenunity Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reedhecre commented Jun 14, 2026 •

edited

Loading

zhangstevenunity left a comment •

edited

Loading

zhangstevenunity Jun 22, 2026 •

edited

Loading

zhangstevenunity Jun 22, 2026 •

edited

Loading

zhangstevenunity Jun 22, 2026 •

edited

Loading