Skip to content

QVAC-19998 feat(ltx): full-GPU LTX-2.3 video support (Metal)#13

Merged
gianni-cor merged 9 commits into
2026-06-04from
2026-06-04-ltx
Jun 25, 2026
Merged

QVAC-19998 feat(ltx): full-GPU LTX-2.3 video support (Metal)#13
gianni-cor merged 9 commits into
2026-06-04from
2026-06-04-ltx

Conversation

@aegioscy

@aegioscy aegioscy commented Jun 15, 2026

Copy link
Copy Markdown

Summary

Splits the LTX / custom-ggml delta out of the 2026-06-04 fork base.

2026-06-04 (base) = upstream 1f9ee88 + the 5 general, upstream-ggml-compatible patches: vcpkg port infra, Flux qkv assert, ESRGAN device API, upscaler defaults, Wan I2V VAE tiling bypass. Its ggml submodule stays at stock leejet/ggml 0ce7ad3.

This PR adds the 6 commits that depend on / switch to the unified qvac ggml:

  • feat: use fused Flux RoPE when available — requires GGML_OP_ROPE_FLUX (not in stock leejet ggml); bumps submodule to qvac-ext-ggml
  • chore: remove ggml git submodule — use vcpkg SD_USE_SYSTEM_GGML
  • fix: remove ggml-impl.h include — symbols are in public ggml.h
  • ggml_graph_cut: use public ggml_graph_leaf/n_leafs/add_leaf API
  • cli: default preferred_gpu_backend to GPU — fixes the LTX pipeline running on CPU
  • cmake: add /bigobj for MSVC — build fix for the larger LTX/system-ggml objects

LTX-2.3 model support itself comes from upstream; these are the qvac-side build/packaging reconciliation + the custom-ggml dependency.

Notes

  • Pure history reorganization: the 2026-06-04-ltx tip (df47d5e) is unchanged and byte-for-byte identical to the original fork tip.
  • The fused-RoPE submodule bump that previously leaked onto the base (pointing the leejet-URL submodule at qvac-ext-ggml@c40a0fc) now lives here instead.

Test plan

  • CI green on 2026-06-04-ltx
  • Build sd-cli (Release, Metal) against vcpkg system ggml
  • Smoke-test LTX-2.3 T2V generation

@aegioscy aegioscy changed the title feat(ltx): build against system/unified ggml (graph_cut public API) QVAC-19998 feat(ltx): build against system/unified ggml (graph_cut public API) Jun 15, 2026
@aegioscy aegioscy changed the title QVAC-19998 feat(ltx): build against system/unified ggml (graph_cut public API) QVAC-19998 feat(ltx): full-GPU LTX-2.3 video support (Metal) Jun 15, 2026
@dev-nid

dev-nid commented Jun 15, 2026

Copy link
Copy Markdown

The default build path looks broken on this branch. Verified on a clean checkout of df47d5e:

cmake .. with default flags (the flow docs/build.md documents) — exit 1:

CMake Error at CMakeLists.txt:312 (add_subdirectory):
add_subdirectory given source "ggml" which is not an existing directory.
-- Configuring incomplete, errors occurred!
The PR removes the ggml submodule, but SD_USE_SYSTEM_GGML still defaults to OFF and CMakeLists.txt:312 still does add_subdirectory(ggml). On 2026-06-04 the leejet submodule populates ggml/ so the default works; after this PR it doesn't. There are also no vcpkg port files in-tree on either branch, so I assume the qvac-ext-ggml port lives in a separate overlay.

Questions:

Is consuming this repo directly (without the vcpkg overlay) unsupported on this branch? If yes, can we either flip SD_USE_SYSTEM_GGML default to ON, or replace the add_subdirectory(ggml) fallback with a message(FATAL_ERROR ...) pointing users at -DSD_USE_SYSTEM_GGML=ON + the qvac-ext-ggml package?
Should docs/build.md be updated here? It still tells users to git clone --recursive / git submodule update, which no longer pulls a ggml tree.
add_definitions(-DGGML_MAX_NAME=128) is guarded by if (NOT SD_USE_SYSTEM_GGML), so it never applies on this branch. Is the qvac-ext-ggml package built with GGML_MAX_NAME=128? If not, long tensor names get silently truncated vs. before.

aegioscy added a commit that referenced this pull request Jun 16, 2026
Addresses review feedback on PR #13. After removing the ggml submodule,
a plain `cmake ..` defaulted SD_USE_SYSTEM_GGML=OFF and hit
add_subdirectory(ggml) on a now-missing directory, failing with a
confusing CMake error.

- Default SD_USE_SYSTEM_GGML to ON (system/vcpkg ggml is the only
  supported path on this branch).
- Replace the add_subdirectory(ggml) fallback with explicit
  FATAL_ERROR messages pointing at the qvac-ext-ggml vcpkg port and the
  vcpkg toolchain file (both for missing system ggml and for an
  explicit -DSD_USE_SYSTEM_GGML=OFF with no submodule present).
- Update docs/build.md: drop the --recursive / git submodule update
  instructions and document the system/vcpkg ggml workflow. Note that
  the port exports GGML_MAX_NAME=128 as a PUBLIC compile definition so
  consumers inherit it automatically.

Co-authored-by: Cursor <cursoragent@cursor.com>
@aegioscy

Copy link
Copy Markdown
Author

Thanks @dev-nid — all four points addressed in 6a13b9c:

  1. Default build path. SD_USE_SYSTEM_GGML now defaults to ON. After the submodule removal, system/vcpkg ggml is the only supported path on this branch, so a plain cmake .. -DCMAKE_TOOLCHAIN_FILE=<vcpkg>/scripts/buildsystems/vcpkg.cmake now resolves ggml::ggml and configures cleanly.

  2. add_subdirectory(ggml) fallback. Replaced. If someone explicitly passes -DSD_USE_SYSTEM_GGML=OFF and there is no ggml/ submodule, CMake now fails with a clear FATAL_ERROR pointing at the qvac-ext-ggml vcpkg port + toolchain, instead of the confusing "source ggml is not an existing directory" error. The system path also emits a clear error if find_package(ggml) fails.

  3. Direct consumption / docs. Yes — consuming this repo directly is supported via the qvac-ext-ggml vcpkg port (not a vendored submodule). docs/build.md updated: dropped the --recursive / git submodule update instructions and documented the system/vcpkg ggml workflow.

  4. GGML_MAX_NAME=128 — no mismatch. The qvac-ext-ggml port builds with -DGGML_MAX_NAME=128 and exports it as a PUBLIC/INTERFACE compile definition on ggml::ggml-base (its ggml-config.cmake appends INTERFACE_COMPILE_DEFINITIONS GGML_MAX_NAME=128). So every consumer linking ggml::ggml inherits 128 automatically — including sd.cpp's own translation units, even though the in-tree add_definitions(-DGGML_MAX_NAME=128) is skipped under system ggml. That add_definitions now only applies to non-system builds.

Comment thread src/ggml_extend.hpp Outdated
Comment thread docs/build.md
Comment thread docs/build.md
@gianni-cor

Copy link
Copy Markdown

I see this PR is adding stuff not related to LTX. I think the problem is that 2026-06-04 is not the correct rebase of 2026-03-01

aegioscy added a commit that referenced this pull request Jun 24, 2026
Addresses review feedback on PR #13. After removing the ggml submodule,
a plain `cmake ..` defaulted SD_USE_SYSTEM_GGML=OFF and hit
add_subdirectory(ggml) on a now-missing directory, failing with a
confusing CMake error.

- Default SD_USE_SYSTEM_GGML to ON (system/vcpkg ggml is the only
  supported path on this branch).
- Replace the add_subdirectory(ggml) fallback with explicit
  FATAL_ERROR messages pointing at the qvac-ext-ggml vcpkg port and the
  vcpkg toolchain file (both for missing system ggml and for an
  explicit -DSD_USE_SYSTEM_GGML=OFF with no submodule present).
- Update docs/build.md: drop the --recursive / git submodule update
  instructions and document the system/vcpkg ggml workflow. Note that
  the port exports GGML_MAX_NAME=128 as a PUBLIC compile definition so
  consumers inherit it automatically.

Co-authored-by: Cursor <cursoragent@cursor.com>
aegioscy and others added 6 commits June 24, 2026 11:44
ggml is now provided entirely by the vcpkg ggml port
(tetherto/qvac-ext-ggml). The submodule is not needed and was
only present for non-vcpkg local builds.

Co-authored-by: Cursor <cursoragent@cursor.com>
The ../ggml/src/ggml-impl.h path was a remnant of the git submodule.
All used symbols (GGML_MAX_DIMS, GGML_MAX_SRC, GGML_ASSERT,
ggml_graph_n_nodes, ggml_graph_node) are in the public ggml.h.

Co-authored-by: Cursor <cursoragent@cursor.com>
ggml_cgraph is now opaque (ggml-impl.h no longer vendored). Replace direct
member access with the public leaf accessors exported by qvac-ext-ggml
2026-06-06, fixing 'member access into incomplete type ggml_cgraph'.

Co-authored-by: Cursor <cursoragent@cursor.com>
SDContextParams::to_sd_ctx_params_t left preferred_gpu_backend out of the
sd_ctx_params_t aggregate initializer, so it zero-initialized to
SD_BACKEND_PREF_CPU and the whole LTX pipeline ran on CPU even on Metal
machines. Add SD_BACKEND_PREF_GPU explicitly; it is only honored when
--backend is unset, so existing --backend overrides are unaffected.

Co-authored-by: Cursor <cursoragent@cursor.com>
With the LTX-2 additions, the stable-diffusion.cpp translation unit exceeds
MSVC's COFF 2^16 section limit and fails to compile on Windows with fatal
error C1128. /bigobj raises the limit; clang/gcc are unaffected.

Co-authored-by: Cursor <cursoragent@cursor.com>
Addresses review feedback on PR #13. After removing the ggml submodule,
a plain `cmake ..` defaulted SD_USE_SYSTEM_GGML=OFF and hit
add_subdirectory(ggml) on a now-missing directory, failing with a
confusing CMake error.

- Default SD_USE_SYSTEM_GGML to ON (system/vcpkg ggml is the only
  supported path on this branch).
- Replace the add_subdirectory(ggml) fallback with explicit
  FATAL_ERROR messages pointing at the qvac-ext-ggml vcpkg port and the
  vcpkg toolchain file (both for missing system ggml and for an
  explicit -DSD_USE_SYSTEM_GGML=OFF with no submodule present).
- Update docs/build.md: drop the --recursive / git submodule update
  instructions and document the system/vcpkg ggml workflow. Note that
  the port exports GGML_MAX_NAME=128 as a PUBLIC compile definition so
  consumers inherit it automatically.

Co-authored-by: Cursor <cursoragent@cursor.com>
@jesusmb1995

jesusmb1995 commented Jun 24, 2026

Copy link
Copy Markdown

The base branch 2026-06-04 seems incorrect. Its not a clean branch with upstream-only work (contains an additional commit that should be on this PR).

                    WHAT YOU WANTED                          WHAT YOU ACTUALLY PUSHED

    PR     ┌──────────────────────────────┐         ┌──────────────────────────────┐
   (ltx)   │ 26d85c7  cmake default ON    │         │ 26d85c7  cmake default ON    │
           │ a9f1834  bigobj              │         │ a9f1834  bigobj              │
           │ 6876388  cli default GPU     │         │ 6876388  cli default GPU     │
           │ aee9deb  graph_cut public API│         │ aee9deb  graph_cut public API│
           │ d8fece9  remove ggml-impl.h  │         │ d8fece9  remove ggml-impl.h  │
           │ 47dbd6b  remove submodule    │         │ 47dbd6b  remove submodule    │
           │ c641ed4  fused Flux RoPE  ◄──┼─ first  │                              │
           │          (needs qvac ggml)   │  qvac   │                              │
           └──────────────────────────────┘  dep    └──────────────────────────────┘
     ══════════ dividing line ══════════
    base   ┌──────────────────────────────┐         │ c641ed4  fused Flux RoPE  ◄──┼─ STRANDED
  (06-04)  │ 6ce5f97  Wan I2V tiling   ◄──┼─ base   │          (needs qvac ggml)   │  on base
           │ 10f8d44  upscaler defaults   │  tip    ╞══════════ dividing line ══════════
           │ 5a1cee5  ESRGAN              │         │ 6ce5f97  Wan I2V tiling   ◄──┼─ base
           │ afd4778  Flux qkv assert     │         │ 10f8d44  upscaler defaults   │  tip
           │ 87aa42a  vcpkg port          │         │ 5a1cee5  ESRGAN              │
           │ 1f9ee88  upstream            │         │ afd4778  Flux qkv assert     │
           └──────────────────────────────┘         │ 87aa42a  vcpkg port          │
                                                    │ 1f9ee88  upstream            │
                                                    └──────────────────────────────┘

@gianni-cor

Copy link
Copy Markdown

The base branch 2026-06-04 seems incorrect. Its not a clean branch with upstream-only work (contains an additional commit that should be on this PR).

                    WHAT YOU WANTED                          WHAT YOU ACTUALLY PUSHED

    PR     ┌──────────────────────────────┐         ┌──────────────────────────────┐
   (ltx)   │ 26d85c7  cmake default ON    │         │ 26d85c7  cmake default ON    │
           │ a9f1834  bigobj              │         │ a9f1834  bigobj              │
           │ 6876388  cli default GPU     │         │ 6876388  cli default GPU     │
           │ aee9deb  graph_cut public API│         │ aee9deb  graph_cut public API│
           │ d8fece9  remove ggml-impl.h  │         │ d8fece9  remove ggml-impl.h  │
           │ 47dbd6b  remove submodule    │         │ 47dbd6b  remove submodule    │
           │ c641ed4  fused Flux RoPE  ◄──┼─ first  │                              │
           │          (needs qvac ggml)   │  qvac   │                              │
           └──────────────────────────────┘  dep    └──────────────────────────────┘
     ══════════ dividing line ══════════
    base   ┌──────────────────────────────┐         │ c641ed4  fused Flux RoPE  ◄──┼─ STRANDED
  (06-04)  │ 6ce5f97  Wan I2V tiling   ◄──┼─ base   │          (needs qvac ggml)   │  on base
           │ 10f8d44  upscaler defaults   │  tip    ╞══════════ dividing line ══════════
           │ 5a1cee5  ESRGAN              │         │ 6ce5f97  Wan I2V tiling   ◄──┼─ base
           │ afd4778  Flux qkv assert     │         │ 10f8d44  upscaler defaults   │  tip
           │ 87aa42a  vcpkg port          │         │ 5a1cee5  ESRGAN              │
           │ 1f9ee88  upstream            │         │ afd4778  Flux qkv assert     │
           └──────────────────────────────┘         │ 87aa42a  vcpkg port          │
                                                    │ 1f9ee88  upstream            │
                                                    └──────────────────────────────┘

It seems correct now, right ?

@aegioscy

Copy link
Copy Markdown
Author

The base branch 2026-06-04 seems incorrect. Its not a clean branch with upstream-only work (contains an additional commit that should be on this PR).

                    WHAT YOU WANTED                          WHAT YOU ACTUALLY PUSHED

    PR     ┌──────────────────────────────┐         ┌──────────────────────────────┐
   (ltx)   │ 26d85c7  cmake default ON    │         │ 26d85c7  cmake default ON    │
           │ a9f1834  bigobj              │         │ a9f1834  bigobj              │
           │ 6876388  cli default GPU     │         │ 6876388  cli default GPU     │
           │ aee9deb  graph_cut public API│         │ aee9deb  graph_cut public API│
           │ d8fece9  remove ggml-impl.h  │         │ d8fece9  remove ggml-impl.h  │
           │ 47dbd6b  remove submodule    │         │ 47dbd6b  remove submodule    │
           │ c641ed4  fused Flux RoPE  ◄──┼─ first  │                              │
           │          (needs qvac ggml)   │  qvac   │                              │
           └──────────────────────────────┘  dep    └──────────────────────────────┘
     ══════════ dividing line ══════════
    base   ┌──────────────────────────────┐         │ c641ed4  fused Flux RoPE  ◄──┼─ STRANDED
  (06-04)  │ 6ce5f97  Wan I2V tiling   ◄──┼─ base   │          (needs qvac ggml)   │  on base
           │ 10f8d44  upscaler defaults   │  tip    ╞══════════ dividing line ══════════
           │ 5a1cee5  ESRGAN              │         │ 6ce5f97  Wan I2V tiling   ◄──┼─ base
           │ afd4778  Flux qkv assert     │         │ 10f8d44  upscaler defaults   │  tip
           │ 87aa42a  vcpkg port          │         │ 5a1cee5  ESRGAN              │
           │ 1f9ee88  upstream            │         │ afd4778  Flux qkv assert     │
           └──────────────────────────────┘         │ 87aa42a  vcpkg port          │
                                                    │ 1f9ee88  upstream            │
                                                    └──────────────────────────────┘

Why would Flux rope be part of my PR in your diagram? Flux Rope is replayed ontop of the rebased branch because it's what we added separate from leejit.

This PR is only LTX additions, and the base branch 2026-06-04 is the target which should contain rope additions, not only things from leejit.

@aegioscy

Copy link
Copy Markdown
Author

ggml submodule repointed to the canonical 2026-06-06 branch (pushed 55352fa to 2026-06-04)

The ggml submodule now tracks tetherto/qvac-ext-ggml@2026-06-06 (tip 805e8e1b) instead of the 2026-06-06-ltx feature branch (cd31e93).

805e8e1b is the merge of qvac-ext-ggml#23 (2026-06-06-ltx) into the canonical 2026-06-06 line — a strict superset of the previous pin (+2 commits) that carries the full merged compute set:

  • Metal fused Flux RoPE kernel + implicit-GEMM conv2d + flash-attention fix
  • Wan IM2COL_3D / PAD left-padding
  • coopmat1 flash-attn f32-accumulation fixes (RADV P*V/output in f32 under GGML_PREC_F32, needed for correct LTX-2.3 output on Strix Halo)
  • ggml_graph_leaf / leafs / n_leafs public API export (used by ggml_graph_cut)

Context on the base-branch discussion (re @jesusmb1995 / @gianni-cor): this line intentionally depends on qvac-ext-ggml — the fused Flux RoPE path in c641ed4 requires GGML_OP_ROPE_FLUX, which is not in stock leejet/ggml. So rather than carrying the -ltx feature branch on the base, the base now pins the merged/canonical 2026-06-06. If the preference is instead for 2026-06-04 to stay strictly upstream-ggml-compatible (stock ggml), the alternative is to move c641ed4 off the base and into the ltx delta — happy to do that instead if that's the direction.

aegioscy and others added 2 commits June 25, 2026 10:24
Resolve the ggml submodule conflict introduced by the base repoint
(55352fa). The base re-added/repointed the vendored ggml submodule,
while this branch removes it in favor of the vcpkg system ggml port
(47dbd6b). Keep the submodule removed: ggml is provided by the
qvac-ext-ggml vcpkg port (pinned to 2026-06-06@805e8e1b), so the
merged tree matches the 2026-06-04-ltx head.

Co-authored-by: Cursor <cursoragent@cursor.com>
Reverses the submodule removal direction: instead of consuming ggml only
from the qvac-ext-ggml vcpkg port, keep the vendored ggml submodule so the
repo can also be built standalone without vcpkg.

- Re-add the ggml submodule pinned to qvac-ext-ggml 2026-06-06@805e8e1b
  (the merged compute branch: Metal fused Flux RoPE/conv2d, Wan
  IM2COL_3D/PAD, coopmat1 f32 flash-attn fixes, public graph_leaf API).
- CMake: default SD_USE_SYSTEM_GGML back to OFF so a plain `cmake ..` after
  a --recursive clone builds the vendored submodule via add_subdirectory.
  System/vcpkg ggml is still fully supported via -DSD_USE_SYSTEM_GGML=ON
  (the diffusion-cpp vcpkg port passes this explicitly, so it is
  unaffected by the default). Update the stale "submodule removed" error.
- docs/build.md: restore the --recursive / submodule-update instructions
  and document both ggml paths (vendored default vs system/vcpkg).

Co-authored-by: Cursor <cursoragent@cursor.com>
@aegioscy

Copy link
Copy Markdown
Author

Update: keeping the vendored ggml submodule (standalone builds) — pushed 93874f3 to 2026-06-04-ltx

Reversing the earlier "remove the vendored ggml submodule" direction. Rather than making the vcpkg system-ggml port the only supported path, this branch now keeps the ggml submodule so the repo can also be built standalone, without vcpkg:

  • Re-added the ggml submodule, pinned to qvac-ext-ggml@2026-06-06 (805e8e1b).
  • CMake: SD_USE_SYSTEM_GGML default flipped back to OFF, so a plain cmake .. after a --recursive clone builds the vendored submodule via add_subdirectory(ggml). System/vcpkg ggml is still fully supported via -DSD_USE_SYSTEM_GGML=ON — the diffusion-cpp vcpkg port passes that explicitly, so the vcpkg path is unaffected by the default.
  • docs/build.md: restored the --recursive / git submodule update --init instructions and now documents both ggml paths.

This also resolves the earlier review points on this branch: with --recursive restored, the WebP/WebM submodules initialize again (so "enabled by default" holds), and the plain cmake .. CPU-only / OpenBLAS examples build out of the box again.

cc @dev-nid @gianni-cor @jesusmb1995

@github-actions

Copy link
Copy Markdown

Review Status

Current Status: ❌ PENDING
Approvals so far: none

Pending reviews: Needs 1 Management or Team Lead, and 1 more from Management, Team Lead, or Member.

…ABLE

Addresses dev-nid review on PR #13. ggml_rope_flux(ctx, v_in, nullptr) was
unclear: with a null position tensor it applies no rotation (V is never
RoPE-rotated) and is just a fused-kernel equivalent of the
permute(0,2,1,3)+reshape_3d in the fallback branch.

- Add a comment documenting the null-pe (permute-only, no rotation) semantics.
- Gate the fused V path on GGML_ROPE_FLUX_DISABLE, matching the q/k fused
  path in rope.hpp, so the whole fused-RoPE kernel family can be disabled
  together; when disabled it falls back to the explicit permute+reshape.

Co-authored-by: Cursor <cursoragent@cursor.com>
@gianni-cor gianni-cor merged commit 11717d2 into 2026-06-04 Jun 25, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants