[Feature] Integrate Mooncake Transfer Engine with TurboMind for PD disaggregation and remote KV

### Motivation

Motivation

LMDeploy already provides PD disaggregation (DistServe) on the PyTorch engine via lmdeploy/pytorch/disagg/, with Mooncake as an optional KV migration backend (MooncakeBackend, using the Python mooncake.engine.TransferEngine).

TurboMind (C++) remains a high-performance path, but it is not integrated with Mooncake today, so Prefill/Decode split deployments cannot reuse TurboMind's paged KV and kernel stack for cross-node KV.

We would like to integrate Mooncake's Transfer Engine (or an equivalent C++ SDK) into TurboMind's C++ layer to:

- Align block lifecycle with SequenceManager / BlockManager (including optional prefix caching and consistent KV quantization layout);
- Asynchronously export KV after prefill on the prefill side, asynchronously pull on the decode side, and overlap transfer with compute;
- Align or stay compatible with Conductor / metadata protocols used on the PyTorch side to avoid diverging semantics.

Goal: enable end-to-end Mooncake-based PD disaggregation on TurboMind with minimal impact on throughput.

Related resources

- In-repo: lmdeploy/pytorch/disagg/backend/mooncake.py, lmdeploy/pytorch/disagg/config.py (MooncakeEngineConfig, MigrationBackend)
- Mooncake: https://github.com/kvcache-ai/Mooncake
- TurboMind pointers: src/turbomind/models/llama/SequenceManager.*, BlockManager.*, src/turbomind/engine/engine.cc and request/scheduling code

Additional context

- Current state: Mooncake PD disaggregation is implemented for PyTorch; TurboMind has no Mooncake / disagg integration.
- Challenges: block/chunk mapping between TurboMind and Mooncake, KV sharding under attn_tp / attn_cp, and scheduling/state machine when waiting for remote KV.
- Suggested phases: optional CMake dependency -> metadata/RPC -> memory transfer hooks -> two-machine e2e validation.

One-liner 

PyTorch engine supports PD disaggregation and KV migration via Mooncake (and DLSlime); TurboMind does not yet support Mooncake-based PD disaggregation.

### Related resources

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Integrate Mooncake Transfer Engine with TurboMind for PD disaggregation and remote KV #4443

Motivation

Related resources

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Integrate Mooncake Transfer Engine with TurboMind for PD disaggregation and remote KV #4443

Description

Motivation

Related resources

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions