High-Performance KCP Protocol Implementation in Rust
Deterministic, allocation-free, lock-free transport layer for real-time applications.
KCP-RS is a Rust implementation of the KCP protocol, designed for scenarios where latency matters more than bandwidth. This implementation focuses on deterministic performance with zero heap allocations in the hot path.
- Zero Allocation: All buffers preallocated at initialization
- Lock-Free: Single-writer design, no mutexes in transport path
- Deterministic Latency: Predictable performance with < 250ns packet processing
- Cache-Friendly: Hot structures optimized for CPU cache lines (≤ 64 bytes)
no_stdCompatible: Works in embedded and bare-metal environments- Type-Safe: Leverages Rust's type system to prevent common protocol bugs
| Metric | Target | Current | Status |
|---|---|---|---|
| Small packet send | < 250 ns | ~213 ns | ✅ |
| Update cycle (empty) | < 100 ns | ~46 ns | ✅ |
| Update cycle (data) | < 250 ns | ~230 ns | ✅ |
| Header encode | < 10 ns | ~5.8 ns | ✅ |
| Header decode | < 10 ns | ~2.6 ns | ✅ |
| Recv (no data) | < 5 ns | ~1.2 ns | ✅ |
| Memory Allocation | 0 | 0 | ✅ |
| Cache Misses | 0 (steady) | 0 | ✅ |
See Performance Design for detailed architecture documentation.
Add to your Cargo.toml:
[dependencies]
kcp-rs = "0.1"Or with specific features:
[dependencies]
kcp-rs = { version = "0.1", default-features = false } # no_std
kcp-rs = { version = "0.1", features = ["small-buffers"] } # Reduced memory for testing| Feature | Description |
|---|---|
std |
Enable standard library support |
small-buffers |
Use smaller buffer sizes (~88 KB vs ~705 KB) for testing |
fec |
Enable Forward Error Correction (Reed-Solomon) for lossy networks |
use kcp_rs::{Kcp, KcpConfig, KcpOutput, KcpResult, KcpError};
// Implement output callback for your transport layer
struct UdpOutput {
socket: std::net::UdpSocket,
target: std::net::SocketAddr,
}
impl KcpOutput for UdpOutput {
fn output(&mut self, data: &[u8]) -> KcpResult<usize> {
self.socket
.send_to(data, self.target)
.map_err(|_| KcpError::BufferFull)
}
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let socket = std::net::UdpSocket::bind("0.0.0.0:0")?;
let target = "127.0.0.1:12345".parse()?;
let output = UdpOutput { socket, target };
// Create KCP instance with conversation ID
let mut kcp = Kcp::new(0x12345678, output);
// Send data
kcp.send(b"Hello, KCP!")?;
// Update KCP state (call periodically)
let current_ms = get_current_time_ms();
kcp.update(current_ms)?;
Ok(())
}For latency-sensitive applications:
use kcp_rs::{Kcp, KcpConfig};
let config = KcpConfig::fast();
// Equivalent to:
// KcpConfig {
// mtu: 1400,
// interval: 20, // 20ms update interval
// nodelay: true, // No delay mode
// resend: 2, // Fast resend after 2 ACKs
// nc: true, // No congestion control
// snd_wnd: 128,
// rcv_wnd: 128,
// stream: false,
// }
let mut kcp = Kcp::with_config(conv_id, output, config);let config = KcpConfig {
mtu: 1200, // Smaller MTU for lossy networks
interval: 10, // 10ms update interval
nodelay: true,
resend: 1, // Aggressive fast resend
nc: false, // Enable congestion control
snd_wnd: 256,
rcv_wnd: 256,
stream: true, // Stream mode (no message boundaries)
};
let mut kcp = Kcp::with_config(conv_id, output, config);use std::time::{Duration, Instant};
fn run_kcp_loop(kcp: &mut Kcp<impl KcpOutput>) -> KcpResult<()> {
let start = Instant::now();
let mut recv_buf = [0u8; 4096];
loop {
let current = start.elapsed().as_millis() as u32;
// Check when next update is needed
let next_update = kcp.check(current);
// Wait until next update or incoming data
let wait_time = next_update.saturating_sub(current);
std::thread::sleep(Duration::from_millis(wait_time as u64));
// Process incoming UDP packets
// ... receive from socket and call kcp.input() ...
// Update KCP state
kcp.update(current)?;
// Receive decoded data
while let Ok(size) = kcp.recv(&mut recv_buf) {
process_message(&recv_buf[..size]);
}
}
}use tokio::net::UdpSocket;
use tokio::time::{interval, Duration};
async fn run_kcp_async(
kcp: &mut Kcp<impl KcpOutput>,
socket: &UdpSocket,
) -> KcpResult<()> {
let mut buf = [0u8; 2048];
let mut recv_buf = [0u8; 4096];
let mut ticker = interval(Duration::from_millis(10));
let start = std::time::Instant::now();
loop {
tokio::select! {
// Periodic update
_ = ticker.tick() => {
let current = start.elapsed().as_millis() as u32;
kcp.update(current)?;
}
// Incoming UDP packet
result = socket.recv(&mut buf) => {
if let Ok(size) = result {
kcp.input(&buf[..size])?;
}
}
}
// Check for received messages
while let Ok(size) = kcp.recv(&mut recv_buf) {
handle_message(&recv_buf[..size]).await;
}
}
}kcp-rs/
├── src/
│ ├── lib.rs # Public API exports
│ ├── constants.rs # Protocol constants (compile-time)
│ ├── sequence.rs # Sequence number arithmetic
│ ├── segment.rs # Segment structures
│ ├── ring_buffer.rs # Lock-free ring buffers
│ ├── codec.rs # Wire format encoding/decoding
│ ├── time.rs # Time utilities
│ ├── kcp.rs # Main KCP implementation
│ └── fec/ # Forward Error Correction (feature-gated)
│ ├── mod.rs # FEC configuration and headers
│ ├── gf256.rs # GF(2^8) finite field arithmetic
│ ├── encoder.rs # Reed-Solomon encoder
│ ├── decoder.rs # Reed-Solomon decoder
│ └── buffer.rs # FEC send/receive buffers
├── benches/
│ └── throughput.rs # Performance benchmarks
├── tests/
│ └── integration.rs # End-to-end tests
├── docs/
│ └── performance_design.md # Architecture documentation
└── examples/
└── echo.rs # Echo server example
All buffers are preallocated at initialization:
pub struct Kcp<O: KcpOutput> {
// Preallocated segment buffers
snd_buf: SendBuffer, // [SendSegment; 256]
rcv_buf: RecvBuffer, // [RecvSegment; 256]
ack_list: AckList, // [AckEntry; 128]
// Preallocated output buffer
output_buf: [u8; MTU_DEFAULT + HEADER_SIZE],
// ...
}Uses wrapping arithmetic and half-range comparison:
// ✅ CORRECT - Handles wraparound
let is_newer = seq_a.wrapping_sub(seq_b) < HALF_RANGE;
// ❌ FORBIDDEN - Breaks at wraparound
let is_newer = seq_a > seq_b;All buffer capacities are powers of two for efficient indexing:
// ✅ CORRECT - Bitwise AND (single instruction)
let index = sequence & (CAPACITY - 1);
// ❌ FORBIDDEN - Modulo (expensive division)
let index = sequence % CAPACITY;Hot fields are grouped within 64-byte cache lines:
#[repr(C)]
pub struct SendSegment {
// Hot fields first (20 bytes)
pub sn: Sequence, // 4 bytes
pub resend_ts: u32, // 4 bytes
pub rto: u32, // 4 bytes
pub fastack: u16, // 2 bytes
pub xmit: u16, // 2 bytes
pub state: SegmentState,// 1 byte
pub frg: u8, // 1 byte
pub data_len: u16, // 2 bytes
// Cold fields after
pub ts: u32,
pub data_offset: u32,
}The fec feature enables Reed-Solomon error correction, allowing recovery from packet loss without retransmission. This is useful for high-loss networks or latency-critical applications.
use kcp_rs::fec::{FecConfig, FecEncoder, FecDecoder};
// Balanced: k=4 data, m=2 parity (50% overhead, recover 2 losses)
let balanced = FecConfig::balanced();
// Low latency: k=2 data, m=1 parity (50% overhead, recover 1 loss)
let low_latency = FecConfig::low_latency();
// High protection: k=8 data, m=4 parity (50% overhead, recover 4 losses)
let high_protection = FecConfig::high_protection();
// Bandwidth efficient: k=10 data, m=2 parity (20% overhead)
let efficient = FecConfig::bandwidth_efficient();Group: [D0] [D1] [D2] [D3] [P0] [P1]
└─── data shards ───┘ └─ parity ─┘
k = 4 m = 2
Any k shards can reconstruct all k data shards.
| Preset | k (data) | m (parity) | Overhead | Recovery |
|---|---|---|---|---|
low_latency |
2 | 1 | 50% | 1 loss |
balanced |
4 | 2 | 50% | 2 losses |
high_protection |
8 | 4 | 50% | 4 losses |
bandwidth_efficient |
10 | 2 | 20% | 2 losses |
- Zero allocation: All FEC buffers preallocated at init
- Inline GF(2^8): Lookup table arithmetic for encoding/decoding
- Vandermonde matrix: Systematic encoding for efficient recovery
Main KCP control block.
impl<O: KcpOutput> Kcp<O> {
/// Create new instance with default config
pub fn new(conv: u32, output: O) -> Self;
/// Create with custom configuration
pub fn with_config(conv: u32, output: O, config: KcpConfig) -> Self;
/// Send data (may fragment into multiple segments)
pub fn send(&mut self, data: &[u8]) -> KcpResult<usize>;
/// Receive reassembled data
pub fn recv(&mut self, buf: &mut [u8]) -> KcpResult<usize>;
/// Process incoming packet
pub fn input(&mut self, data: &[u8]) -> KcpResult<()>;
/// Update KCP state (call periodically)
pub fn update(&mut self, current_ms: u32) -> KcpResult<()>;
/// Check when next update is needed
pub fn check(&self, current_ms: u32) -> u32;
/// Get number of packets waiting to be sent
pub fn wait_snd(&self) -> u32;
/// Check if connection is dead
pub fn is_dead(&self) -> bool;
/// Get current RTT estimate (ms)
pub fn rtt(&self) -> u32;
/// Get current RTO (ms)
pub fn rto(&self) -> u32;
}Implement this for your transport layer:
pub trait KcpOutput {
fn output(&mut self, data: &[u8]) -> KcpResult<usize>;
}Configuration options:
pub struct KcpConfig {
pub mtu: u32, // Maximum transmission unit (default: 1400)
pub interval: u32, // Update interval in ms (default: 100)
pub nodelay: bool, // No-delay mode (default: false)
pub resend: u32, // Fast resend trigger (default: 0)
pub nc: bool, // No congestion control (default: false)
pub snd_wnd: u16, // Send window size (default: 32)
pub rcv_wnd: u16, // Receive window size (default: 128)
pub stream: bool, // Stream mode (default: false)
}Error types:
pub enum KcpError {
BufferTooSmall, // Output buffer too small
BufferFull, // Send/receive buffer full
InvalidPacket, // Malformed packet
ConvMismatch, // Conversation ID mismatch
DataTooLarge, // Data exceeds maximum size
WouldBlock, // No data available (non-blocking)
DeadLink, // Connection timeout
} 0 1 2 3
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| conv (4 bytes) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| cmd | frg | wnd |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ts (4 bytes) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| sn (4 bytes) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| una (4 bytes) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| len (4 bytes) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data (variable) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Field | Size | Description |
|---|---|---|
| conv | 4 | Conversation ID |
| cmd | 1 | Command: PUSH(81), ACK(82), WASK(83), WINS(84) |
| frg | 1 | Fragment index (0 = last fragment) |
| wnd | 2 | Receive window size |
| ts | 4 | Timestamp |
| sn | 4 | Sequence number |
| una | 4 | Unacknowledged sequence number |
| len | 4 | Data length |
| data | var | Payload |
- Congestion control enabled
- Standard RTO backoff (RTO × 2)
- Suitable for bulk transfers
KcpConfig::fast()- No congestion control
- Reduced RTO backoff (RTO × 1.5)
- Fast retransmit after 2 duplicate ACKs
- 20ms update interval
- Optimized for real-time applications
KcpConfig { stream: true, ..Default::default() }- No message boundaries
- Data treated as byte stream
- Lower overhead for small messages
Run benchmarks:
cargo benchExpected results on modern hardware (x86_64):
send/64_bytes time: [2.6 ns 2.7 ns 2.7 ns]
thrpt: [22.1 GiB/s 22.4 GiB/s 22.7 GiB/s]
send/256_bytes time: [2.5 ns 2.5 ns 2.6 ns]
thrpt: [93.2 GiB/s 93.8 GiB/s 94.3 GiB/s]
send/1024_bytes time: [2.5 ns 2.5 ns 2.5 ns]
thrpt: [375.5 GiB/s 378.0 GiB/s 380.1 GiB/s]
send/4096_bytes time: [2.6 ns 2.6 ns 2.6 ns]
thrpt: [1446.8 GiB/s 1455.7 GiB/s 1463.1 GiB/s]
update/empty time: [46.6 ns 46.7 ns 46.8 ns]
update/with_data time: [45.5 ns 45.5 ns 45.5 ns]
Note: Run with
--features small-buffersto avoid stack overflow in tests.
| Feature | Original KCP (C) | KCP-RS |
|---|---|---|
| Language | C | Rust |
| Memory Safety | Manual | Guaranteed |
| Allocation | Dynamic | Preallocated |
| Thread Safety | External sync | Single-writer |
no_std Support |
No | Yes |
| Sequence Safety | Manual | Type-enforced |
- Online Gaming: Low-latency game state synchronization
- Video Streaming: Real-time video with retransmission
- VoIP: Voice communication over unreliable networks
- IoT: Embedded devices with constrained resources
- VPN: Reliable tunneling over UDP
- Reduce
interval(e.g., 10-20ms) - Enable
nodelaymode - Increase window sizes (
snd_wnd,rcv_wnd) - Disable congestion control (
nc: true) if network allows
- Reduce
resendthreshold for faster retransmit - Decrease MTU to avoid fragmentation
- Check network path MTU
Buffer sizes are fixed at compile time:
// In constants.rs (default)
pub const SND_BUF_CAPACITY: usize = 256;
pub const RCV_BUF_CAPACITY: usize = 256;
// With small-buffers feature
pub const SND_BUF_CAPACITY: usize = 32;
pub const RCV_BUF_CAPACITY: usize = 32;| Configuration | Memory per Kcp Instance |
|---|---|
| Default | ~705 KB |
small-buffers |
~88 KB |
⚠️ Stack Overflow Warning: Default buffers are large. UseBox::new(Kcp::new(...))to allocate on heap, or usesmall-buffersfeature for testing.
Contributions are welcome! Please ensure:
- No heap allocation in hot path
- All buffer sizes power of two
- No modulo operations on indices
- Sequence comparisons use wrapping arithmetic
- Benchmarks show no regression
See Performance Design for detailed design principles.
# Run tests (use small-buffers to avoid stack overflow)
cargo test --features small-buffers
# Run integration tests
cargo test --test integration --features small-buffers
# Run benchmarks
cargo bench
# Check no_std compatibility
cargo build --no-default-features
# Run clippy
cargo clippyBSD 3-Clause License. See LICENSE for details.
- Original KCP by skywind3000
- Design principles from high-frequency trading systems
- Rust community for excellent tooling