Skip to content

feat(ptodsl): add flash-attention demos and mixed-backend subkernel pipeline support#816

Open
Zhendong404 wants to merge 13 commits into
hw-native-sys:mainfrom
Zhendong404:fa-demo-restack
Open

feat(ptodsl): add flash-attention demos and mixed-backend subkernel pipeline support#816
Zhendong404 wants to merge 13 commits into
hw-native-sys:mainfrom
Zhendong404:fa-demo-restack

Conversation

@Zhendong404

Copy link
Copy Markdown
Contributor

Summary

This PR brings in a PTODSL/ptoas update centered on flash-attention demos, mixed-kernel or mixed-backend compilation, and subkernel/backend helper lowering fixes.

Compared with hw-native-sys/PTOAS:main, this branch is ahead by 12 commits and includes both frontend PTODSL improvements and the supporting PTOAS lowering/runtime/test updates.

Main Changes

  1. Add PTODSL flash-attention demos and example code
  • Add ptodsl/examples/flash_attention/flash_attention_cv_split.py
  • Add ptodsl/examples/flash_attention/flash_attention_vf_fusion.py
  • Add supporting example files under ptodsl/examples/flash_attention/
  • Add FA-related PTODSL example entry files such as ptodsl/examples/fa_dn_ptodsl.py
  1. Support mixed-kernel and mixed-backend hybrid compilation in PTODSL
  • Extend PTODSL tracing / session / subkernel compilation flow to support mixed child modules
  • Add backend child compile-unit assembly and helper materialization support in PTOAS
  • Add new normalization/materialization passes for uncovered tile sections and tile handles
  • Update driver-side mixed-backend child module compilation and diagnostics
  1. Support subkernel-related sync and helper lowering
  • Add insert-sync support for subkernels
  • Inline backend helpers after shared mainline materialization so tile_buf ABI is preserved
  • Improve symbol/logical-name handling for peer/helper lookup across backend-partitioned modules
  1. Add A5 gm_tensor slot support for PTODSL pipe flow
  • Frontend/runtime/lowering changes for A5 pipe slot handling
  • Add corresponding lit coverage for frontend ordering and split/nosplit cases
  1. Fix EmitC / VPTO / graph-sync related issues
  • Sink PTOAS__TILE_DATA reads after TASSIGN
  • Rewrite malformed nested emitc.verbatim trailing semicolons in emitted C++ as a workaround
  • Let graph sync solver recognize pto.section.* regions
  • Fix vmulscvt emission in the CANN900 LLVM emitter
  • Support AST rewrite for non-entry PTODSL kernels
  1. Refresh PTODSL documentation
  • Update ptodsl/docs/user_guide/01-introduction.md
  • Update ptodsl/docs/user_guide/03-kernel-entry-and-subkernels.md
  • Refresh quick start / walkthrough / additional examples to match the new PTODSL surface
  1. Add and update test coverage
  • Add and update PTODSL Python tests
  • Add and update lit tests for mixed backend, helper ABI, peer references, uncovered tile section normalization, A5 pipe slot handling, and EmitC fixes
  • Update test/lit/lit.cfg.py

Scope

Diff summary:

  • 120 files changed
  • 12652 insertions
  • 968 deletions

Validation

  • Confirmed branch fa-demo-restack is already pushed to fork Zhendong404/PTOAS
  • Confirmed compare target is clean against hw-native-sys/PTOAS:main
  • Confirmed there is no existing open PR for Zhendong404:fa-demo-restack

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces backend-partitioned container support, a new pass to normalize uncovered tile sections, and helper inlining for PTODSL subkernel calls. It also updates the EmitC lowering pipeline to handle pointer-like tile buffer addresses and adds several new examples and tests. The reviewer feedback focuses on improving robustness and correctness: refactoring the uncovered segment collection to prevent over-aggressive splitting, adding defensive checks for memory space attributes in EmitC lowering to avoid potential crashes, ensuring deterministic symbol resolution during peer lookups, and using a more robust check for static dimensions.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +504 to +556
static void collectUncoveredTopLevelSegments(
func::FuncOp funcOp, SmallVectorImpl<UncoveredTopLevelSegment> &segments) {
if (!funcOp || funcOp.isDeclaration() || !funcOp.getBody().hasOneBlock())
return;

Block &entryBlock = funcOp.getBody().front();
UncoveredTopLevelSegment current;

auto flushCurrent = [&]() {
if (!current.firstOp)
return;
segments.push_back(current);
current = {};
};

for (Operation &op : entryBlock.getOperations()) {
if (isa<func::ReturnOp>(op)) {
flushCurrent();
continue;
}

if (isExplicitSection(&op)) {
flushCurrent();
continue;
}

UncoveredTopLevelSegment opSummary = summarizeTopLevelOperation(&op);
if (!opSummary.containsTileOp) {
flushCurrent();
continue;
}

if (!current.firstOp) {
current = std::move(opSummary);
continue;
}

std::optional<InferredSectionKind> currentKind = inferSegmentKind(current);
std::optional<InferredSectionKind> opKind = inferSegmentKind(opSummary);
bool mustSplit = current.containsNestedExplicitSection ||
opSummary.containsNestedExplicitSection || !currentKind ||
!opKind || *currentKind != *opKind;
if (mustSplit) {
flushCurrent();
current = std::move(opSummary);
continue;
}

mergeSegmentSummary(current, opSummary);
}

flushCurrent();
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of collectUncoveredTopLevelSegments flushes and splits the active segment immediately upon encountering any top-level operation that does not contain a tile op (e.g., a scalar calculation or a subkernel call). This results in unnecessary splitting of adjacent tile operations of the same kind, creating multiple small sections and potentially duplicating neutral top-level operations (like subkernel calls) into both Cube and Vector child modules during splitting.\n\nInstead, we should use a cohesive partitioning algorithm that accumulates neutral operations into the current segment, and only flushes/splits when there is an actual conflict between Vector and Cube kinds, or when hitting an explicit section/return.

static void collectUncoveredTopLevelSegments(
    func::FuncOp funcOp, SmallVectorImpl<UncoveredTopLevelSegment> &segments) {
  if (!funcOp || funcOp.isDeclaration() || !funcOp.getBody().hasOneBlock())
    return;

  Block &entryBlock = funcOp.getBody().front();
  UncoveredTopLevelSegment current;

  auto flushCurrent = [&]() {
    if (!current.firstOp)
      return;
    if (current.containsTileOp)
      segments.push_back(current);
    current = {};
  };

  for (Operation &op : entryBlock.getOperations()) {
    if (isa<func::ReturnOp>(op) || isExplicitSection(&op)) {
      flushCurrent();
      continue;
    }

    UncoveredTopLevelSegment opSummary = summarizeTopLevelOperation(&op);
    if (opSummary.containsNestedExplicitSection) {
      flushCurrent();
      segments.push_back(opSummary);
      continue;
    }

    if (!current.firstOp) {
      current = std::move(opSummary);
      continue;
    }

    std::optional<InferredSectionKind> currentKind = inferSegmentKind(current);
    std::optional<InferredSectionKind> opKind = inferSegmentKind(opSummary);

    if (currentKind && opKind && *currentKind != *opKind) {
      flushCurrent();
      current = std::move(opSummary);
    } else {
      mergeSegmentSummary(current, opSummary);
    }
  }

  flushCurrent();
}

Comment thread lib/PTO/Transforms/PTOToEmitC.cpp Outdated
Comment on lines +754 to +755
std::string qualifier =
addrSpaceQualifier(type.getMemorySpace().getAddressSpace());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Calling type.getMemorySpace().getAddressSpace() directly without checking if the memory space attribute is present and is of type AddressSpaceAttr can lead to a crash or undefined behavior. It is safer to use a defensive check similar to other parts of this file.

Suggested change
std::string qualifier =
addrSpaceQualifier(type.getMemorySpace().getAddressSpace());
std::string qualifier = "__gm__";
if (auto ms = type.getMemorySpace()) {
if (auto ptoAttr = dyn_cast<pto::AddressSpaceAttr>(ms))
qualifier = addrSpaceQualifier(ptoAttr.getAddressSpace());
}

Comment thread lib/PTO/Transforms/PTOToEmitC.cpp Outdated
Comment on lines +1055 to +1057
if (auto ptrTy = dyn_cast<pto::PtrType>(originalCalleeArgTy)) {
elemTy = ptrTy.getElementType();
as = ptrTy.getMemorySpace().getAddressSpace();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Defensively check ptrTy.getMemorySpace() using dyn_cast_or_null<pto::AddressSpaceAttr> before calling getAddressSpace() to prevent potential crashes if the memory space attribute is null or of an unexpected type.

  if (auto ptrTy = dyn_cast<pto::PtrType>(originalCalleeArgTy)) {
    elemTy = ptrTy.getElementType();
    if (auto asAttr = dyn_cast_or_null<pto::AddressSpaceAttr>(ptrTy.getMemorySpace()))
      as = asAttr.getAddressSpace();
  }

Comment thread lib/PTO/IR/PTO.cpp
Comment on lines +288 to +304
SmallVector<func::FuncOp> fallbackMatches;
outerModule.walk([&](func::FuncOp funcOp) {
auto visibility = funcOp->getAttrOfType<StringAttr>("sym_visibility");
if (visibility && visibility.getValue() == "private")
return WalkResult::advance();

StringRef symbolName = funcOp.getSymName();
if (symbolName == target) {
fallbackMatches.clear();
fallbackMatches.push_back(funcOp);
return WalkResult::interrupt();
}
if (funcOp->hasAttr(kPTODSLLogicalNameAttrName) &&
getPTODSLLogicalNameOrSymbolName(funcOp) == target)
fallbackMatches.push_back(funcOp);
return WalkResult::advance();
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Interrupting the walk immediately upon finding an exact symbol name match can lead to non-deterministic or incorrect symbol resolution if multiple sibling modules contain public functions with the same name. It is safer to collect all matches (both exact and logical) without interrupting, and then verify that exactly one unique match exists.

  SmallVector<func::FuncOp> fallbackMatches;
  outerModule.walk([&](func::FuncOp funcOp) {
    auto visibility = funcOp->getAttrOfType<StringAttr>("sym_visibility");
    if (visibility && visibility.getValue() == "private")
      return WalkResult::advance();

    StringRef symbolName = funcOp.getSymName();
    if (symbolName == target ||
        (funcOp->hasAttr(kPTODSLLogicalNameAttrName) &&
         getPTODSLLogicalNameOrSymbolName(funcOp) == target)) {
      fallbackMatches.push_back(funcOp);
    }
    return WalkResult::advance();
  });

Comment on lines +587 to +589
bool isStatic = llvm::all_of(shape, [](int64_t dim) {
return dim != ShapedType::kDynamic;
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using dim != ShapedType::kDynamic to check for static dimensions is less robust than checking dim >= 0, as any negative dimension size in MLIR represents a dynamic or invalid dimension. Checking dim >= 0 prevents potential issues with other negative dimension representations.

Suggested change
bool isStatic = llvm::all_of(shape, [](int64_t dim) {
return dim != ShapedType::kDynamic;
});
bool isStatic = llvm::all_of(shape, [](int64_t dim) {
return dim >= 0;
});

@reedhecre

reedhecre commented Jun 14, 2026

Copy link
Copy Markdown

Codex Review

该评论由 review 机器人自动更新。

Summary

Review failed at stage codex-review: exit=1

Findings

未生成结构化 findings,因为 review 过程提前失败。

Log Tail

 test/lit/vpto/issue220_vrelu_i32_vpto_llvm.pto     |   10 +-
 .../vpto/non_entry_helper_not_kernel_annotated.pto |   32 +
 ...alize_uncovered_raw_vpto_vector_module_kind.pto |   33 +
 .../normalize_uncovered_tile_sections_mixed.pto    |   51 +
 ...ed_tile_sections_reject_residual_nested_mix.pto |   39 +
 ...vered_tile_sections_skip_kernel_kind_module.pto |   48 +
 .../normalize_uncovered_tile_sections_vector.pto   |   45 +
 test/lit/vpto/ptodsl_subkernel_backend_inline.pto  |   55 +
 test/lit/vpto/section_sugar_helper_funcs.pto       |   34 +
 ...elang_cross_file_inline_proc_backend_inline.pto |   36 +-
 .../vpto/tilelang_inline_proc_backend_inline.pto   |   35 +-
 .../lit/vpto/tilelang_soft_vmod_backend_inline.pto |  126 --
 test/lit/vpto/vmulscvt_vpto_llvm.pto               |    6 +-
 ...to_infer_module_kernel_kind_without_section.pto |   27 +
 .../lit/vpto/vpto_mainline_inline_proc_cleanup.pto |   34 +-
 .../vpto/vreg_low_precision_memory_vpto_llvm.pto   |   47 +-
 test/samples/TPushTPop/ptodsl/local_c2v/kernel.py  |   24 +-
 tools/ptoas/VPTOHostStubEmission.cpp               |    2 +-
 tools/ptoas/driver.cpp                             |  445 +++++-
 tools/ptoas/ptoas.cpp                              |  129 +-
 tools/ptoas/ptoas.h                                |    4 +
 115 files changed, 13144 insertions(+), 986 deletions(-)
===== END STAGE clone rc=0 @ 2026-06-18 00:56:31 =====

===== STAGE codex-review @ 2026-06-18 00:56:31 =====
set -euo pipefail
cd '/tmp/ptoas-pr-review-monitor/runs/20260618_005526_pr816/repo'
'codex' exec -C '/tmp/ptoas-pr-review-monitor/runs/20260618_005526_pr816/repo' -s read-only -c 'model_provider="codereview"' -c 'model="gpt-5.4"' -c 'model_reasoning_effort="xhigh"' --output-schema '/tmp/ptoas-pr-review-monitor/runs/20260618_005526_pr816/review_schema.json' -o '/tmp/ptoas-pr-review-monitor/runs/20260618_005526_pr816/codex_last_message.json' --color never - < '/tmp/ptoas-pr-review-monitor/runs/20260618_005526_pr816/review_prompt.txt'
OpenAI Codex v0.115.0 (research preview)
--------
workdir: /tmp/ptoas-pr-review-monitor/runs/20260618_005526_pr816/repo
model: gpt-5.4
provider: codereview
approval: never
sandbox: read-only
reasoning effort: xhigh
reasoning summaries: none
session id: 019ed683-563b-74a3-8537-e70ef1444c19
--------
user
你现在在审查 GitHub PR。

仓库:hw-native-sys/PTOAS
PR:#816 feat(ptodsl): add flash-attention demos and mixed-backend subkernel pipeline support
作者:Zhendong404
base branch:origin/main
head branch:HEAD(当前已 checkout 到 PR head)

要求:
1. 只审查这个 PR 相对 origin/main 的改动,必要时可以看上下文文件。
2. 重点找真实的 correctness / regression / contract mismatch / CI / runtime / compatibility 问题。
3. 不要提纯风格建议,不要提低价值猜测。
4. 严格按优先级输出:
   - P1:高概率会导致错误结果、编译/运行失败、严重回归、发布阻断
   - P2:重要缺陷、行为回归、遗漏校验/测试、较大兼容性问题
   - P3:次要但明确可改的问题
5. 如果没有问题,summary 直接写:未检查到 PR #816 存在问题,并返回 findings=[]。
6. 如果有问题,summary 简洁概括,findings 里每条都要给出:
   - severity
   - title
   - body(说明为什么是问题,尽量具体)
   - file(尽量给相对路径)
   - line(能确定就填整数,否则 null)

建议先查看:
- git status --short
- git diff --stat origin/main...HEAD
- git diff --unified=80 origin/main...HEAD

最终输出必须严格匹配 JSON schema。

mcp startup: no servers
Reconnecting... 1/5 (unexpected status 403 Forbidden: {"code":"GROUP_DISABLED","message":"API Key 所属分组已停用"}, url: https://codex.0u0o.com/responses, cf-ray: a0d39692098b4705-SJC, request id: 0feb4dd8-bafa-46a3-9191-34f75882836f)
Reconnecting... 2/5 (unexpected status 403 Forbidden: {"code":"GROUP_DISABLED","message":"API Key 所属分组已停用"}, url: https://codex.0u0o.com/responses, cf-ray: a0d39695caa7174b-SJC, request id: bcab9db2-28bd-4326-9d9b-eefe53b9df0d)
Reconnecting... 3/5 (unexpected status 403 Forbidden: {"code":"GROUP_DISABLED","message":"API Key 所属分组已停用"}, url: https://codex.0u0o.com/responses, cf-ray: a0d3969a5e592b74-LAX, request id: 2a56ab74-5e53-4d8b-b1e6-c0975387e0ce)
Reconnecting... 4/5 (unexpected status 403 Forbidden: {"code":"GROUP_DISABLED","message":"API Key 所属分组已停用"}, url: https://codex.0u0o.com/responses, cf-ray: a0d396a23b977db7-LAX, request id: e4ad58cf-74b3-4f23-907f-aabad21b3281)
Reconnecting... 5/5 (unexpected status 403 Forbidden: {"code":"GROUP_DISABLED","message":"API Key 所属分组已停用"}, url: https://codex.0u0o.com/responses, cf-ray: a0d396ae4d4dd7a4-LAX, request id: a20fa458-6b0e-4882-92b9-32c10fc966ae)
ERROR: unexpected status 403 Forbidden: {"code":"GROUP_DISABLED","message":"API Key 所属分组已停用"}, url: https://codex.0u0o.com/responses, cf-ray: a0d396c699b42b10-SJC, request id: 2d08f801-339c-4d6a-bdab-8b193680982f
Warning: no last agent message; wrote empty content to /tmp/ptoas-pr-review-monitor/runs/20260618_005526_pr816/codex_last_message.json
===== END STAGE codex-review rc=1 @ 2026-06-18 00:56:41 =====

@Zhendong404 Zhendong404 force-pushed the fa-demo-restack branch 4 times, most recently from 4ab1ce5 to 912e855 Compare June 15, 2026 12:54
@Zhendong404

Copy link
Copy Markdown
Contributor Author

/run a3

@Zhendong404

Copy link
Copy Markdown
Contributor Author

/run a5

@reedhecre

Copy link
Copy Markdown

已接收 /run a5,A5 板测器会处理这条请求。

页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。

@reedhecre

Copy link
Copy Markdown

已接收 /run a3,A3 板测器会处理这条请求。

页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。

@reedhecre

Copy link
Copy Markdown

A3 板测失败

日志尾部

al.py) SKIP requires --pto-arch=a5
Sync(test_intercore_sync_a5_ptoisa_vec.py) SKIP requires --pto-arch=a5
Sync(test_intercore_sync_a5.py) SKIP requires --pto-arch=a5
Sync(test_mem_inject_sync_basic.py) OK   generated: test_mem_inject_sync_basic-pto.cpp
Sync(test_set_wait_unified_api.py) OK   generated: test_set_wait_unified_api-pto.cpp
Sync(test_tmov_col_major_16x1_align_a5.pto) SKIP requires --pto-arch=a5
Sync(test_tmov_col_major_16x1_align_a5.py) SKIP requires --pto-arch=a5
Sync(test_tmov_row_major_1x16_control_a5.pto) SKIP requires --pto-arch=a5
Sync(test_tmov_row_major_1x16_control_a5.py) SKIP requires --pto-arch=a5
Sync(tmatmulk_autosync.py) OK   generated: tmatmulk_autosync-pto.cpp
TileSetGetValue(tile_getval_mat_invalid.py) XFAIL ptobc encode failed as expected
TileSetGetValue(tileSetGetValue.py) OK   generated: tileSetGetValue-pto.cpp
TInsert(tinsert_fp.py) OK   generated: tinsert_fp-pto.cpp
TInsert(tinsert.py) OK   generated: tinsert-pto.cpp
Tpows(tpows.py) OK   generated: tpows-pto.cpp
Tpow(tpow.py) OK   generated: tpow-pto.cpp
TPrefetchAsync(tprefetch_async_binding.py) OK   generated: tprefetch_async_binding-pto.cpp
TPrefetch(tprefetch.py) OK   generated: tprefetch-pto.cpp
Trans(trans.py) OK   generated: trans-pto.cpp
Trap(trap.py) OK   generated: trap-pto.cpp
TTri(ttri.py) OK   generated: ttri-pto.cpp
VectorAddition(vadd_pto_ir.py) OK   generated: vadd_pto_ir-pto.cpp
VectorAddition(vadd_validshape_hyper.py) OK   generated: vadd_validshape_hyper-pto.cpp
VectorAddition(vectorAddition.py) OK   generated: vectorAddition-pto.cpp
Xors(xors.py) OK   generated: xors-pto.cpp
Xor(xor.py)  OK   generated: xor-pto.cpp
-----------------------------
OK=233  FAIL=2  SKIP=39
=============================
===== END STAGE sample-build-and-test rc=1 @ 2026-06-15 21:13:53 =====

@reedhecre

Copy link
Copy Markdown

A5 板测失败

失败用例

  • test_tmov_row_major_1x16_control_a5 (run, exit=2)
  • test_tmov_col_major_16x1_align_a5 (run, exit=2)
  • test_dynamic_valid_shape (run, exit=2)
  • test_barrier_sync (run, exit=2)
  • test_auto_sync_tail_hint (run, exit=2)
  • rmsnorm_incore_0 (run, exit=2)
  • rar_optimization_test (run, exit=2)
  • nested_loop_confliect (run, exit=2)
  • matmul (run, exit=2)
  • decode_projection_incore_0 (run, exit=2)
  • compensation_test (run, exit=2)
  • add_double_dynamic (run, exit=2)
  • rems (run, exit=2)
  • rem (run, exit=2)
  • rope_kv_cache (run, exit=2)
  • rmsnorm (run, exit=2)
  • qwen3_decode_incore_7 (run, exit=2)
  • qwen3_decode_incore_6 (run, exit=2)
  • qwen3_decode_incore_5 (run, exit=2)
  • qwen3_decode_incore_4 (run, exit=2)
  • qwen3_decode_incore_2 (run, exit=2)
  • qwen3_decode_incore_1 (run, exit=2)
  • qwen3_decode_incore_12 (run, exit=2)
  • qwen3_decode_incore_11 (run, exit=2)
  • qwen3_decode_incore_10 (run, exit=2)
  • post_rmsnorm (run, exit=2)
  • vector_example_dag_kernel_mul (run, exit=2)
  • vector_example_dag_kernel_add_scalar (run, exit=2)
  • vector_example_dag_kernel_add (run, exit=2)
  • paged_attention_example_kernel_softmax_prepare (run, exit=2)
  • paged_attention_example_kernel_qk_matmul (run, exit=2)
  • paged_attention_example_kernel_pv_matmul (run, exit=2)
  • paged_attention_example_kernel_online_update (run, exit=2)
  • paged_attention_example_kernel_init_inplace (run, exit=2)
  • orchestration_example_kernel_mul (run, exit=2)
  • orchestration_example_kernel_add_scalar (run, exit=2)
  • orchestration_example_kernel_add (run, exit=2)
  • prelu (run, exit=2)
  • plan_memory_reuse_sequential (run, exit=2)
  • plan_memory_peak_exact_capacity (run, exit=2)
  • plan_memory_peak_8_overlapping (run, exit=2)
  • plan_memory_no_reuse_overlap (run, exit=2)
  • plan_memory_nested_loops (run, exit=2)
  • plan_memory_loop_no_reuse_outer_live (run, exit=2)
  • plan_memory_loop_in_if (run, exit=2)
  • plan_memory_if_yield (run, exit=2)
  • plan_memory_if_in_loop (run, exit=2)
  • plan_memory_fragmentation_two_holes (run, exit=2)
  • plan_memory_fragmentation_hole_fit (run, exit=2)
  • plan_memory_for_iter_args_yield (run, exit=2)
  • plan_memory_bind_tile_alias_liveness (run, exit=2)
  • partition_view_verify_valid (run, exit=2)
  • partition_view_verify_rank_mismatch_valid (run, exit=2)
  • partition5d_dynamic_a5 (run, exit=2)
  • partition5d_a5 (run, exit=2)
  • tensor_view_layout_dn (run, exit=2)
  • sparse_attn_test_incore_7 (run, exit=2)
  • decode_swa_test_incore_40 (run, exit=2)
  • decode_hca_test_incore_54 (run, exit=2)
  • decode_csa_test_incore_81 (run, exit=2)
  • attention_swa_test_incore_40 (run, exit=2)
  • attention_hca_test_incore_54 (run, exit=2)
  • attention_csa_test_refresh_incore_81 (run, exit=2)
  • tbroadcast_root_binding (run, exit=139)
  • cmps (run, exit=2)
  • cmp (run, exit=2)

@reedhecre

Copy link
Copy Markdown

A5 板测失败详情:PR #816

test_tmov_row_major_1x16_control_a5

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libtest_tmov_row_major_1x16_control_a5_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/test_tmov_row_major_1x16_control_a5.dir/build.make:98: test_tmov_row_major_1x16_control_a5] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/test_tmov_row_major_1x16_control_a5.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:07:37] ERROR: testcase failed (exit 2): test_tmov_row_major_1x16_control_a5
test_tmov_col_major_16x1_align_a5

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libtest_tmov_col_major_16x1_align_a5_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/test_tmov_col_major_16x1_align_a5.dir/build.make:98: test_tmov_col_major_16x1_align_a5] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/test_tmov_col_major_16x1_align_a5.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:07:40] ERROR: testcase failed (exit 2): test_tmov_col_major_16x1_align_a5
test_dynamic_valid_shape

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libtest_dynamic_valid_shape_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/test_dynamic_valid_shape.dir/build.make:98: test_dynamic_valid_shape] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/test_dynamic_valid_shape.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:11:25] ERROR: testcase failed (exit 2): test_dynamic_valid_shape
test_barrier_sync

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libtest_barrier_sync_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/test_barrier_sync.dir/build.make:98: test_barrier_sync] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/test_barrier_sync.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:11:27] ERROR: testcase failed (exit 2): test_barrier_sync
test_auto_sync_tail_hint

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libtest_auto_sync_tail_hint_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/test_auto_sync_tail_hint.dir/build.make:98: test_auto_sync_tail_hint] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/test_auto_sync_tail_hint.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:11:30] ERROR: testcase failed (exit 2): test_auto_sync_tail_hint
rmsnorm_incore_0

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by librmsnorm_incore_0_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/rmsnorm_incore_0.dir/build.make:98: rmsnorm_incore_0] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/rmsnorm_incore_0.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:12:25] ERROR: testcase failed (exit 2): rmsnorm_incore_0
rar_optimization_test

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by librar_optimization_test_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/rar_optimization_test.dir/build.make:98: rar_optimization_test] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/rar_optimization_test.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:12:28] ERROR: testcase failed (exit 2): rar_optimization_test
nested_loop_confliect

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libnested_loop_confliect_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/nested_loop_confliect.dir/build.make:98: nested_loop_confliect] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/nested_loop_confliect.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:12:31] ERROR: testcase failed (exit 2): nested_loop_confliect
matmul

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libmatmul_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/matmul.dir/build.make:98: matmul] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/matmul.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:12:33] ERROR: testcase failed (exit 2): matmul
decode_projection_incore_0

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libdecode_projection_incore_0_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/decode_projection_incore_0.dir/build.make:98: decode_projection_incore_0] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/decode_projection_incore_0.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:12:36] ERROR: testcase failed (exit 2): decode_projection_incore_0
compensation_test

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libcompensation_test_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/compensation_test.dir/build.make:98: compensation_test] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/compensation_test.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:12:38] ERROR: testcase failed (exit 2): compensation_test
add_double_dynamic

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libadd_double_dynamic_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/add_double_dynamic.dir/build.make:98: add_double_dynamic] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/add_double_dynamic.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:12:41] ERROR: testcase failed (exit 2): add_double_dynamic
rems

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by librems_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/rems.dir/build.make:98: rems] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/rems.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:21:57] ERROR: testcase failed (exit 2): rems
rem

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by librem_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/rem.dir/build.make:98: rem] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/rem.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:21:59] ERROR: testcase failed (exit 2): rem
rope_kv_cache

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by librope_kv_cache_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/rope_kv_cache.dir/build.make:98: rope_kv_cache] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/rope_kv_cache.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:29] ERROR: testcase failed (exit 2): rope_kv_cache
rmsnorm

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by librmsnorm_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/rmsnorm.dir/build.make:98: rmsnorm] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/rmsnorm.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:32] ERROR: testcase failed (exit 2): rmsnorm
qwen3_decode_incore_7

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libqwen3_decode_incore_7_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_7.dir/build.make:98: qwen3_decode_incore_7] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/qwen3_decode_incore_7.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:35] ERROR: testcase failed (exit 2): qwen3_decode_incore_7
qwen3_decode_incore_6

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libqwen3_decode_incore_6_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_6.dir/build.make:98: qwen3_decode_incore_6] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/qwen3_decode_incore_6.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:38] ERROR: testcase failed (exit 2): qwen3_decode_incore_6
qwen3_decode_incore_5

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libqwen3_decode_incore_5_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_5.dir/build.make:98: qwen3_decode_incore_5] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/qwen3_decode_incore_5.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:40] ERROR: testcase failed (exit 2): qwen3_decode_incore_5
qwen3_decode_incore_4

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libqwen3_decode_incore_4_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_4.dir/build.make:98: qwen3_decode_incore_4] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/qwen3_decode_incore_4.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:43] ERROR: testcase failed (exit 2): qwen3_decode_incore_4
qwen3_decode_incore_2

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libqwen3_decode_incore_2_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_2.dir/build.make:98: qwen3_decode_incore_2] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/qwen3_decode_incore_2.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:46] ERROR: testcase failed (exit 2): qwen3_decode_incore_2
qwen3_decode_incore_1

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libqwen3_decode_incore_1_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_1.dir/build.make:98: qwen3_decode_incore_1] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/qwen3_decode_incore_1.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:48] ERROR: testcase failed (exit 2): qwen3_decode_incore_1
qwen3_decode_incore_12

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libqwen3_decode_incore_12_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_12.dir/build.make:98: qwen3_decode_incore_12] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/qwen3_decode_incore_12.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:51] ERROR: testcase failed (exit 2): qwen3_decode_incore_12
qwen3_decode_incore_11

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libqwen3_decode_incore_11_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_11.dir/build.make:98: qwen3_decode_incore_11] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/qwen3_decode_incore_11.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:53] ERROR: testcase failed (exit 2): qwen3_decode_incore_11
qwen3_decode_incore_10

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libqwen3_decode_incore_10_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/qwen3_decode_incore_10.dir/build.make:98: qwen3_decode_incore_10] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/qwen3_decode_incore_10.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:56] ERROR: testcase failed (exit 2): qwen3_decode_incore_10
post_rmsnorm

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libpost_rmsnorm_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/post_rmsnorm.dir/build.make:98: post_rmsnorm] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/post_rmsnorm.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:22:59] ERROR: testcase failed (exit 2): post_rmsnorm
vector_example_dag_kernel_mul

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libvector_example_dag_kernel_mul_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/vector_example_dag_kernel_mul.dir/build.make:98: vector_example_dag_kernel_mul] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/vector_example_dag_kernel_mul.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:23:57] ERROR: testcase failed (exit 2): vector_example_dag_kernel_mul
vector_example_dag_kernel_add_scalar

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libvector_example_dag_kernel_add_scalar_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/vector_example_dag_kernel_add_scalar.dir/build.make:98: vector_example_dag_kernel_add_scalar] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/vector_example_dag_kernel_add_scalar.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:23:59] ERROR: testcase failed (exit 2): vector_example_dag_kernel_add_scalar
vector_example_dag_kernel_add

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libvector_example_dag_kernel_add_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/vector_example_dag_kernel_add.dir/build.make:98: vector_example_dag_kernel_add] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/vector_example_dag_kernel_add.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:02] ERROR: testcase failed (exit 2): vector_example_dag_kernel_add
paged_attention_example_kernel_softmax_prepare

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libpaged_attention_example_kernel_softmax_prepare_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/paged_attention_example_kernel_softmax_prepare.dir/build.make:98: paged_attention_example_kernel_softmax_prepare] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/paged_attention_example_kernel_softmax_prepare.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:04] ERROR: testcase failed (exit 2): paged_attention_example_kernel_softmax_prepare
paged_attention_example_kernel_qk_matmul

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libpaged_attention_example_kernel_qk_matmul_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/paged_attention_example_kernel_qk_matmul.dir/build.make:98: paged_attention_example_kernel_qk_matmul] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/paged_attention_example_kernel_qk_matmul.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:07] ERROR: testcase failed (exit 2): paged_attention_example_kernel_qk_matmul
paged_attention_example_kernel_pv_matmul

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libpaged_attention_example_kernel_pv_matmul_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/paged_attention_example_kernel_pv_matmul.dir/build.make:98: paged_attention_example_kernel_pv_matmul] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/paged_attention_example_kernel_pv_matmul.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:09] ERROR: testcase failed (exit 2): paged_attention_example_kernel_pv_matmul
paged_attention_example_kernel_online_update

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libpaged_attention_example_kernel_online_update_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/paged_attention_example_kernel_online_update.dir/build.make:98: paged_attention_example_kernel_online_update] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/paged_attention_example_kernel_online_update.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:12] ERROR: testcase failed (exit 2): paged_attention_example_kernel_online_update
paged_attention_example_kernel_init_inplace

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libpaged_attention_example_kernel_init_inplace_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/paged_attention_example_kernel_init_inplace.dir/build.make:98: paged_attention_example_kernel_init_inplace] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/paged_attention_example_kernel_init_inplace.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:15] ERROR: testcase failed (exit 2): paged_attention_example_kernel_init_inplace
orchestration_example_kernel_mul

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by liborchestration_example_kernel_mul_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/orchestration_example_kernel_mul.dir/build.make:98: orchestration_example_kernel_mul] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/orchestration_example_kernel_mul.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:17] ERROR: testcase failed (exit 2): orchestration_example_kernel_mul
orchestration_example_kernel_add_scalar

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by liborchestration_example_kernel_add_scalar_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/orchestration_example_kernel_add_scalar.dir/build.make:98: orchestration_example_kernel_add_scalar] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/orchestration_example_kernel_add_scalar.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:20] ERROR: testcase failed (exit 2): orchestration_example_kernel_add_scalar
orchestration_example_kernel_add

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by liborchestration_example_kernel_add_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/orchestration_example_kernel_add.dir/build.make:98: orchestration_example_kernel_add] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/orchestration_example_kernel_add.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:22] ERROR: testcase failed (exit 2): orchestration_example_kernel_add
prelu

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libprelu_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/prelu.dir/build.make:98: prelu] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/prelu.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:44] ERROR: testcase failed (exit 2): prelu
plan_memory_reuse_sequential

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_reuse_sequential_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_reuse_sequential.dir/build.make:98: plan_memory_reuse_sequential] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_reuse_sequential.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:47] ERROR: testcase failed (exit 2): plan_memory_reuse_sequential
plan_memory_peak_exact_capacity

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_peak_exact_capacity_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_peak_exact_capacity.dir/build.make:98: plan_memory_peak_exact_capacity] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_peak_exact_capacity.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:50] ERROR: testcase failed (exit 2): plan_memory_peak_exact_capacity
plan_memory_peak_8_overlapping

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_peak_8_overlapping_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_peak_8_overlapping.dir/build.make:98: plan_memory_peak_8_overlapping] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_peak_8_overlapping.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:52] ERROR: testcase failed (exit 2): plan_memory_peak_8_overlapping
plan_memory_no_reuse_overlap

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_no_reuse_overlap_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_no_reuse_overlap.dir/build.make:98: plan_memory_no_reuse_overlap] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_no_reuse_overlap.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:55] ERROR: testcase failed (exit 2): plan_memory_no_reuse_overlap
plan_memory_nested_loops

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_nested_loops_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_nested_loops.dir/build.make:98: plan_memory_nested_loops] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_nested_loops.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:24:57] ERROR: testcase failed (exit 2): plan_memory_nested_loops
plan_memory_loop_no_reuse_outer_live

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_loop_no_reuse_outer_live_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_loop_no_reuse_outer_live.dir/build.make:98: plan_memory_loop_no_reuse_outer_live] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_loop_no_reuse_outer_live.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:25:00] ERROR: testcase failed (exit 2): plan_memory_loop_no_reuse_outer_live
plan_memory_loop_in_if

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_loop_in_if_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_loop_in_if.dir/build.make:98: plan_memory_loop_in_if] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_loop_in_if.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:25:03] ERROR: testcase failed (exit 2): plan_memory_loop_in_if
plan_memory_if_yield

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_if_yield_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_if_yield.dir/build.make:98: plan_memory_if_yield] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_if_yield.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:25:05] ERROR: testcase failed (exit 2): plan_memory_if_yield
plan_memory_if_in_loop

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_if_in_loop_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_if_in_loop.dir/build.make:98: plan_memory_if_in_loop] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_if_in_loop.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:25:08] ERROR: testcase failed (exit 2): plan_memory_if_in_loop
plan_memory_fragmentation_two_holes

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_fragmentation_two_holes_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_fragmentation_two_holes.dir/build.make:98: plan_memory_fragmentation_two_holes] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_fragmentation_two_holes.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:25:10] ERROR: testcase failed (exit 2): plan_memory_fragmentation_two_holes
plan_memory_fragmentation_hole_fit

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_fragmentation_hole_fit_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_fragmentation_hole_fit.dir/build.make:98: plan_memory_fragmentation_hole_fit] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_fragmentation_hole_fit.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:25:13] ERROR: testcase failed (exit 2): plan_memory_fragmentation_hole_fit
plan_memory_for_iter_args_yield

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_for_iter_args_yield_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_for_iter_args_yield.dir/build.make:98: plan_memory_for_iter_args_yield] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_for_iter_args_yield.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:25:15] ERROR: testcase failed (exit 2): plan_memory_for_iter_args_yield
plan_memory_bind_tile_alias_liveness

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libplan_memory_bind_tile_alias_liveness_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/plan_memory_bind_tile_alias_liveness.dir/build.make:98: plan_memory_bind_tile_alias_liveness] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/plan_memory_bind_tile_alias_liveness.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:25:18] ERROR: testcase failed (exit 2): plan_memory_bind_tile_alias_liveness
partition_view_verify_valid

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libpartition_view_verify_valid_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/partition_view_verify_valid.dir/build.make:98: partition_view_verify_valid] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/partition_view_verify_valid.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:26:10] ERROR: testcase failed (exit 2): partition_view_verify_valid
partition_view_verify_rank_mismatch_valid

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libpartition_view_verify_rank_mismatch_valid_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/partition_view_verify_rank_mismatch_valid.dir/build.make:98: partition_view_verify_rank_mismatch_valid] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/partition_view_verify_rank_mismatch_valid.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:26:13] ERROR: testcase failed (exit 2): partition_view_verify_rank_mismatch_valid
partition5d_dynamic_a5

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libpartition5d_dynamic_a5_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/partition5d_dynamic_a5.dir/build.make:98: partition5d_dynamic_a5] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/partition5d_dynamic_a5.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:26:15] ERROR: testcase failed (exit 2): partition5d_dynamic_a5
partition5d_a5

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libpartition5d_a5_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/partition5d_a5.dir/build.make:98: partition5d_a5] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/partition5d_a5.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:26:18] ERROR: testcase failed (exit 2): partition5d_a5
tensor_view_layout_dn

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libtensor_view_layout_dn_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/tensor_view_layout_dn.dir/build.make:98: tensor_view_layout_dn] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/tensor_view_layout_dn.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:31:23] ERROR: testcase failed (exit 2): tensor_view_layout_dn
sparse_attn_test_incore_7

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libsparse_attn_test_incore_7_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/sparse_attn_test_incore_7.dir/build.make:98: sparse_attn_test_incore_7] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/sparse_attn_test_incore_7.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:35:52] ERROR: testcase failed (exit 2): sparse_attn_test_incore_7
decode_swa_test_incore_40

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libdecode_swa_test_incore_40_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/decode_swa_test_incore_40.dir/build.make:98: decode_swa_test_incore_40] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/decode_swa_test_incore_40.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:35:55] ERROR: testcase failed (exit 2): decode_swa_test_incore_40
decode_hca_test_incore_54

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libdecode_hca_test_incore_54_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/decode_hca_test_incore_54.dir/build.make:98: decode_hca_test_incore_54] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/decode_hca_test_incore_54.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:35:57] ERROR: testcase failed (exit 2): decode_hca_test_incore_54
decode_csa_test_incore_81

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libdecode_csa_test_incore_81_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/decode_csa_test_incore_81.dir/build.make:98: decode_csa_test_incore_81] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/decode_csa_test_incore_81.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:36:00] ERROR: testcase failed (exit 2): decode_csa_test_incore_81
attention_swa_test_incore_40

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libattention_swa_test_incore_40_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/attention_swa_test_incore_40.dir/build.make:98: attention_swa_test_incore_40] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/attention_swa_test_incore_40.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:36:03] ERROR: testcase failed (exit 2): attention_swa_test_incore_40
attention_hca_test_incore_54

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libattention_hca_test_incore_54_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/attention_hca_test_incore_54.dir/build.make:98: attention_hca_test_incore_54] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/attention_hca_test_incore_54.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:36:05] ERROR: testcase failed (exit 2): attention_hca_test_incore_54
attention_csa_test_refresh_incore_81

stage=run info=exit=2

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: kernel
>>> referenced by libattention_csa_test_refresh_incore_81_kernel.so
cceld: Linker ReturnCode: 1
cceld: ExecutionFailed: 0
cceld: ErrMsg:
bisheng: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/attention_csa_test_refresh_incore_81.dir/build.make:98: attention_csa_test_refresh_incore_81] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/attention_csa_test_refresh_incore_81.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-06-15 21:36:08] ERROR: testcase failed (exit 2): attention_csa_test_refresh_incore_81
tbroadcast_root_binding

stage=run info=exit=139

./test/npu_validation/scripts/run_remote_npu_validation.sh: line 380: 55771 Segmentation fault      (core dumped) LD_LIBRARY_PATH="${LD_LIBRARY_PATH_NPU}" ./build/${testcase}
[2026-06-15 21:39:14] ERROR: testcase failed (exit 139): tbroadcast_root_binding
cmps

stage=run info=exit=2

[ERROR] Packed mask mismatch: golden_v2.bin vs v2.bin, idx=4 (golden=98, out=0)
[ERROR] compare failed
[2026-06-15 21:43:51] ERROR: testcase failed (exit 2): cmps
cmp

stage=run info=exit=2

[ERROR] Packed mask mismatch: golden_v3.bin vs v3.bin, idx=4 (golden=49, out=0)
[ERROR] compare failed
[2026-06-15 21:44:02] ERROR: testcase failed (exit 2): cmp

@Zhendong404 Zhendong404 force-pushed the fa-demo-restack branch 6 times, most recently from 26c8e46 to 356a943 Compare June 17, 2026 04:38
@Zhendong404

Copy link
Copy Markdown
Contributor Author

/run a3

@reedhecre

Copy link
Copy Markdown

已接收 /run a3,A3 板测器会处理这条请求。

页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。

@Zhendong404

Copy link
Copy Markdown
Contributor Author

/run a5

@reedhecre

Copy link
Copy Markdown

已接收 /run a5,A5 板测器会处理这条请求。

页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。

@reedhecre

Copy link
Copy Markdown

A5 板测失败

日志尾部

ntercore_sync_a5.py) FAIL python failed: test_intercore_sync_a5.py
Sync(test_mem_inject_sync_basic.py) FAIL python failed: test_mem_inject_sync_basic.py
Sync(test_set_wait_unified_api.py) FAIL python failed: test_set_wait_unified_api.py
Sync(test_tmov_col_major_16x1_align_a5.pto) OK   generated: test_tmov_col_major_16x1_align_a5.cpp
Sync(test_tmov_col_major_16x1_align_a5.py) OK   generated: test_tmov_col_major_16x1_align_a5-pto.cpp
Sync(test_tmov_row_major_1x16_control_a5.pto) OK   generated: test_tmov_row_major_1x16_control_a5.cpp
Sync(test_tmov_row_major_1x16_control_a5.py) OK   generated: test_tmov_row_major_1x16_control_a5-pto.cpp
Sync(tmatmulk_autosync_a5.py) FAIL python failed: tmatmulk_autosync_a5.py
TileSetGetValue(tile_getval_mat_invalid.py) XFAIL python failed as expected
TileSetGetValue(tileSetGetValue.py) FAIL python failed: tileSetGetValue.py
TInsert(tinsert_fp.py) FAIL python failed: tinsert_fp.py
TInsert(tinsert.py) FAIL python failed: tinsert.py
Tpows(tpows.py) FAIL python failed: tpows.py
Tpow(tpow.py) FAIL python failed: tpow.py
TPrefetchAsync(tprefetch_async_binding.py) FAIL python failed: tprefetch_async_binding.py
TPrefetch(tprefetch.py) FAIL python failed: tprefetch.py
Trans(trans.py) FAIL python failed: trans.py
Trap(trap.py) FAIL python failed: trap.py
TTri(ttri.py) FAIL python failed: ttri.py
VectorAddition(vadd_pto_ir.py) FAIL python failed: vadd_pto_ir.py
VectorAddition(vadd_validshape_hyper.py) FAIL python failed: vadd_validshape_hyper.py
VectorAddition(vectorAddition.py) FAIL python failed: vectorAddition.py
Xors(xors.py) FAIL python failed: xors.py
Xor(xor.py)  FAIL python failed: xor.py
-----------------------------
OK=80  FAIL=162  SKIP=28
=============================
===== END STAGE sample-build-and-test rc=1 @ 2026-06-17 15:21:07 =====

@reedhecre

Copy link
Copy Markdown

A3 板测失败

日志尾部

st/npu_validation/scripts/generate_testcase.py
pto-isa vendor cache hit: repo=https://gitcode.com/cann/pto-isa.git requested_commit=b65945bf2aa7b0020b9ae813c4d60f18cf0e538f actual_commit=b65945bf2aa7b0020b9ae813c4d60f18cf0e538f

===== STAGE board-validation @ 2026-06-17 14:31:23 =====
task-submit cwd=/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260617_141556_manual_pr816/payload
task-submit env-file=/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260617_141556_manual_pr816/board-validation.env
task-submit run-script:
set -euo pipefail
cd /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260617_141556_manual_pr816/payload
export DEVICE_ID=${TASK_DEVICE:-auto}
bash ./test/npu_validation/scripts/run_remote_npu_validation.sh
task-submit wrapped-command: bash -lc "set -euo pipefail; cd /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260617_141556_manual_pr816/payload; export DEVICE_ID=${TASK_DEVICE:-auto}; bash ./test/npu_validation/scripts/run_remote_npu_validation.sh"
task-submit submit-cmd: /usr/local/bin/task-submit --device auto --max-time 0 --env-file /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260617_141556_manual_pr816/board-validation.env 'bash -lc "set -euo pipefail; cd /home/zhongxuan/ptoas-board-monitor/runtime/runs/20260617_141556_manual_pr816/payload; export DEVICE_ID=${TASK_DEVICE:-auto}; bash ./test/npu_validation/scripts/run_remote_npu_validation.sh"'
task_20260617_143123_21661425699
提示: task-submit --help 查看卡分配/日志等机制与用法
task-submit task-id: task_20260617_143123_21661425699
等待任务执行: task_20260617_143123_21661425699 (Ctrl+C 终止任务)
[npu-lock] 获取设备 9 的锁 (无超时)...

Session terminated, killing shell... ...killed.
=== 任务已终止 (killed) ===
task-submit wait rc=143
completed (exit=143)
===== END STAGE board-validation rc=143 @ 2026-06-17 15:55:19 =====

@zhangstevenunity zhangstevenunity left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manual deep review (Codex bot failed on this PR)

Heads-up: the automated codex-review on this PR failed to run (exit 1 -- the upstream review API key group was disabled, HTTP 403), so there are currently no machine-generated findings. I did a manual correctness pass over the genuinely new/changed C++ and the mixed-backend flow, diffed against the true PR base (e37d41e1).

Net: the change is largely additive and the existing single-backend path looks preserved -- I did not find a release-blocking miscompile in the default path. Three concrete items below (1x P2, 2x P3), posted inline.

Inline findings

  • P2 PTONormalizeUncoveredTileSections.cpp -- inferSegmentKind ignores ambiguousTileOps, so an unclassifiable tile op (e.g. pto.tpush/pto.tpop) co-resident with a classifiable op in an uncovered segment is silently wrapped into an inferred section instead of routing to emitSegmentInferenceError (asymmetric with inferWholeFunctionKind).
  • P3 driver.cpp -- isBackendPartitionedContainer is a vacuous return true; in mixed mode stray top-level ops would be silently dropped.
  • P3 PTO.cpp -- lookupPeerFuncAcrossContainer''s empty-fallback resolves private siblings, contradicting the visibility filter a few lines above and the driver''s public-only contract.

Non-blocking observations (not posted inline)

  • Explicit entry detection. Entries now require pto.entry/pto.kernel/hacc.entry/pto.aicore. This is the documented intent and in-repo consumers are migrated, but a module that reaches codegen with function definitions and zero recognized entries emits device code with no host stub and no diagnostic. A guarded warning would surface silent non-launch -- but note zero-entry child modules are legitimately valid in mixed-backend mode, so any such check must be scoped to callee-only children to avoid false positives.
  • Subkernel-call autosync pipe assumption. ptodsl_subkernel_call_autosync models the call with a single representative pipe (PIPE_V for simd/simt, PIPE_M for cube), and pto-inline-backend-helpers runs after InsertSync without re-running it. That boundary sync is correct only if the helper is pipe-homogeneous w.r.t. each tile argument (first/last effect per operand is on the representative pipe). If multi-pipe subkernel helpers are possible, an assert or a per-operand pipe derivation would close the gap; the test only exercises a PIPE_V-only helper.
  • Minor: the debugIROutputRequested list in resolveSingleBackend omits emitVPTOLLVMDialect (present in the other two gates). Harmless today (buildBackendInfo backstops it), but inconsistent.

Spot-checked and looks correct

EmitC PTOAS__TILE_DATA sink-after-TASSIGN (only ever moves a pure .data() read later; per-use re-materialization is dominance-safe), public-helper extern "C" linkage (matches the pre-existing declarations), generalized subview/reinterpret_cast pointer typing (no offset truncation, int64 offsets), vmulscvt lowering (llvm.hivm.vmulscvt.v128f16, operand order matches the masked+part sibling pattern), attachHIVMKernelAnnotations non-entry exclusion, and the GraphSyncSolver pto.section.* transparency change (strictly additive -- section bodies were previously not translated at all; set/wait flags still land inside the section).

}

static std::optional<InferredSectionKind>
inferSegmentKind(const UncoveredTopLevelSegment &segment) {

@zhangstevenunity zhangstevenunity Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 -- inferSegmentKind ignores ambiguousTileOps; unclassifiable tile ops are silently absorbed into an inferred section.

This decides a segment''s kind purely from vectorTileOpCount/cubeTileOpCount and never inspects segment.ambiguousTileOps. That is asymmetric with the whole-function analogue inferWholeFunctionKind, which bails when summary.ambiguousOps is non-empty. The collection side is symmetric -- inspectSegmentOperation pushes unclassifiable tile-like ops into segment.ambiguousTileOps, and emitSegmentInferenceError is even written to report them -- but because inferSegmentKind never returns nullopt on their account, that error branch is unreachable whenever a classifiable op coexists in the same segment.

Consequence: a top-level container op (e.g. scf.for) whose body holds a classifiable tile op of one kind plus an unclassifiable tile op is inferred as that one kind, and normalizeFunction -> wrapUncoveredTopLevelSegment moves the unclassifiable op into that pto.section.* with no diagnostic. After wrapping it counts as "covered" (the residual verifier does not descend into explicit sections), so the misplacement is silent.

pto.tpush/pto.tpop are concrete triggers: they are isTileLikeOp (OpPipeInterface + pto.t prefix) but classifyTileOp returns nullopt -- their $tile is a PTOPipeEntryType, so getPipe() yields PIPE_UNASSIGNED and there are no buffer-typed operands to classify by address space. A loop body doing VEC compute plus a pto.tpush, left uncovered, would be wrapped as section.vector with the push silently absorbed; if its correct ownership were the cube side, that is a wrong-core placement.

Fix (one line, mirroring the function-level guard):

static std::optional<InferredSectionKind>
inferSegmentKind(const UncoveredTopLevelSegment &segment) {
  if (!segment.ambiguousTileOps.empty())
    return std::nullopt;   // route to emitSegmentInferenceError
  if (segment.vectorTileOpCount && segment.cubeTileOpCount)
    return std::nullopt;
  ...

No lit test currently covers an unclassifiable tile op co-resident with a classifiable one inside an uncovered segment.

Comment thread tools/ptoas/driver.cpp
ModuleOp child) {
for (NamedAttribute attr : outer->getAttrs()) {
static bool isBackendPartitionedContainer(ModuleOp module) {
return llvm::all_of(module.getOps<ModuleOp>(),

@zhangstevenunity zhangstevenunity Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 -- isBackendPartitionedContainer validates nothing (always true).

static bool isBackendPartitionedContainer(ModuleOp module) {
  return llvm::all_of(module.getOps<ModuleOp>(),
                      [](ModuleOp) { return true; });
}

getOps<ModuleOp>() only yields the child modules and the predicate returns true for each, so this is vacuously true for any module (including one with zero children). It never checks that the outer module contains only child modules. It gates mixed-backend mode in resolveSingleBackend (children.size() > 1 && isBackendPartitionedContainer(module)), and collectChildJobs builds jobs solely from module.getOps<ModuleOp>() -- so any stray top-level op in the outer container (e.g. a func.func directly under it) is silently dropped from the output rather than rejected. Canonical PTODSL/doc IR only nests child modules, so this is latent, but the predicate should enforce the invariant it is named for, e.g.:

Block *body = module.getBody();
return !body->empty() &&
       llvm::all_of(body->getOperations(),
                    [](Operation &op) { return isa<ModuleOp>(op); });

and the driver should emit a diagnostic (not silently drop) when an outer container in mixed mode holds non-module top-level ops.

Comment thread lib/PTO/IR/PTO.cpp

if (fallbackMatches.size() == 1)
return fallbackMatches.front();
if (fallbackMatches.empty()) {

@zhangstevenunity zhangstevenunity Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 -- cross-child peer fallback resolves private siblings, contradicting the visibility filter a few lines above.

The outerModule.walk just above (the fallbackMatches collection) deliberately skips functions whose sym_visibility is "private", mirroring the driver''s findSiblingSourceFunction, which only accepts public sibling defs. But this empty-fallbackMatches fallback uses SymbolTable::lookupSymbolIn(childModule, target), which returns a symbol regardless of visibility. So when no public peer exists anywhere but a sibling child module has a matching private func.func, ImportReservedBufferOp::verify accepts it, whereas the driver''s mixed-backend assembly later rejects private peers ("unresolved cross-child peer_func reference"). Net effect: the op verifies clean, then fails later in the driver with a less localized message (no wrong code is emitted). The fallback also matches only by raw symbol name, not by pto.ptodsl.logical_name, unlike the walk above. Suggest skipping sym_visibility == "private" funcs here (matching the walk and the driver), or dropping the fallback since a public match would already have been found by the recursive walk.

Comment thread tools/ptoas/driver.cpp
}

static FailureOr<func::FuncOp>
findSiblingSourceFunction(ModuleOp outer, ModuleOp targetChild,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为啥需要 peer func 来着,我有点忘记了,麻烦评论或者注释补充解释下

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants