Skip to content

[Feature] Integrate Mooncake Transfer Engine with TurboMind for PD disaggregation and remote KV #4443

@Dayuxiaoshui

Description

@Dayuxiaoshui

Motivation

Motivation

LMDeploy already provides PD disaggregation (DistServe) on the PyTorch engine via lmdeploy/pytorch/disagg/, with Mooncake as an optional KV migration backend (MooncakeBackend, using the Python mooncake.engine.TransferEngine).

TurboMind (C++) remains a high-performance path, but it is not integrated with Mooncake today, so Prefill/Decode split deployments cannot reuse TurboMind's paged KV and kernel stack for cross-node KV.

We would like to integrate Mooncake's Transfer Engine (or an equivalent C++ SDK) into TurboMind's C++ layer to:

  • Align block lifecycle with SequenceManager / BlockManager (including optional prefix caching and consistent KV quantization layout);
  • Asynchronously export KV after prefill on the prefill side, asynchronously pull on the decode side, and overlap transfer with compute;
  • Align or stay compatible with Conductor / metadata protocols used on the PyTorch side to avoid diverging semantics.

Goal: enable end-to-end Mooncake-based PD disaggregation on TurboMind with minimal impact on throughput.

Related resources

  • In-repo: lmdeploy/pytorch/disagg/backend/mooncake.py, lmdeploy/pytorch/disagg/config.py (MooncakeEngineConfig, MigrationBackend)
  • Mooncake: https://github.com/kvcache-ai/Mooncake
  • TurboMind pointers: src/turbomind/models/llama/SequenceManager., BlockManager., src/turbomind/engine/engine.cc and request/scheduling code

Additional context

  • Current state: Mooncake PD disaggregation is implemented for PyTorch; TurboMind has no Mooncake / disagg integration.
  • Challenges: block/chunk mapping between TurboMind and Mooncake, KV sharding under attn_tp / attn_cp, and scheduling/state machine when waiting for remote KV.
  • Suggested phases: optional CMake dependency -> metadata/RPC -> memory transfer hooks -> two-machine e2e validation.

One-liner

PyTorch engine supports PD disaggregation and KV migration via Mooncake (and DLSlime); TurboMind does not yet support Mooncake-based PD disaggregation.

Related resources

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions