Cake is a Rust framework for multimodal distributed inference. It shards models across consumer devices — iOS, Android, macOS, Linux, Windows — to run workloads that wouldn't fit on a single GPU.
Built on Candle with support for CUDA, Metal, Vulkan, and CPU backends.
- Installation — Building from source, platform support, acceleration backends
- Models — Supported text, image, and voice model architectures
- Usage — Downloading models, running inference, Web UI, TUI chat
- REST API — OpenAI-compatible endpoints for chat, audio, and image generation
- Clustering — Zero-config mDNS discovery, manual topology, model splitting
- Image Generation — FLUX and Stable Diffusion image synthesis
- Voice Generation — VibeVoice TTS with voice cloning
- Docker — Container builds for Linux/NVIDIA
- Benchmarks — Performance comparison vs reference implementations