Skip to content

brianpht/kcp-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KCP-RS

High-Performance KCP Protocol Implementation in Rust

Deterministic, allocation-free, lock-free transport layer for real-time applications.

Overview

KCP-RS is a Rust implementation of the KCP protocol, designed for scenarios where latency matters more than bandwidth. This implementation focuses on deterministic performance with zero heap allocations in the hot path.

Key Features

  • Zero Allocation: All buffers preallocated at initialization
  • Lock-Free: Single-writer design, no mutexes in transport path
  • Deterministic Latency: Predictable performance with < 250ns packet processing
  • Cache-Friendly: Hot structures optimized for CPU cache lines (≤ 64 bytes)
  • no_std Compatible: Works in embedded and bare-metal environments
  • Type-Safe: Leverages Rust's type system to prevent common protocol bugs

Performance Characteristics

Metric Target Current Status
Small packet send < 250 ns ~213 ns
Update cycle (empty) < 100 ns ~46 ns
Update cycle (data) < 250 ns ~230 ns
Header encode < 10 ns ~5.8 ns
Header decode < 10 ns ~2.6 ns
Recv (no data) < 5 ns ~1.2 ns
Memory Allocation 0 0
Cache Misses 0 (steady) 0

See Performance Design for detailed architecture documentation.

Installation

Add to your Cargo.toml:

[dependencies]
kcp-rs = "0.1"

Or with specific features:

[dependencies]
kcp-rs = { version = "0.1", default-features = false }  # no_std
kcp-rs = { version = "0.1", features = ["small-buffers"] }  # Reduced memory for testing

Available Features

Feature Description
std Enable standard library support
small-buffers Use smaller buffer sizes (~88 KB vs ~705 KB) for testing
fec Enable Forward Error Correction (Reed-Solomon) for lossy networks

Quick Start

Basic Usage

use kcp_rs::{Kcp, KcpConfig, KcpOutput, KcpResult, KcpError};

// Implement output callback for your transport layer
struct UdpOutput {
    socket: std::net::UdpSocket,
    target: std::net::SocketAddr,
}

impl KcpOutput for UdpOutput {
    fn output(&mut self, data: &[u8]) -> KcpResult<usize> {
        self.socket
            .send_to(data, self.target)
            .map_err(|_| KcpError::BufferFull)
    }
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let socket = std::net::UdpSocket::bind("0.0.0.0:0")?;
    let target = "127.0.0.1:12345".parse()?;
    
    let output = UdpOutput { socket, target };
    
    // Create KCP instance with conversation ID
    let mut kcp = Kcp::new(0x12345678, output);
    
    // Send data
    kcp.send(b"Hello, KCP!")?;
    
    // Update KCP state (call periodically)
    let current_ms = get_current_time_ms();
    kcp.update(current_ms)?;
    
    Ok(())
}

Fast Mode Configuration

For latency-sensitive applications:

use kcp_rs::{Kcp, KcpConfig};

let config = KcpConfig::fast();
// Equivalent to:
// KcpConfig {
//     mtu: 1400,
//     interval: 20,      // 20ms update interval
//     nodelay: true,     // No delay mode
//     resend: 2,         // Fast resend after 2 ACKs
//     nc: true,          // No congestion control
//     snd_wnd: 128,
//     rcv_wnd: 128,
//     stream: false,
// }

let mut kcp = Kcp::with_config(conv_id, output, config);

Custom Configuration

let config = KcpConfig {
    mtu: 1200,          // Smaller MTU for lossy networks
    interval: 10,       // 10ms update interval
    nodelay: true,
    resend: 1,          // Aggressive fast resend
    nc: false,          // Enable congestion control
    snd_wnd: 256,
    rcv_wnd: 256,
    stream: true,       // Stream mode (no message boundaries)
};

let mut kcp = Kcp::with_config(conv_id, output, config);

Usage Patterns

Event Loop Integration

use std::time::{Duration, Instant};

fn run_kcp_loop(kcp: &mut Kcp<impl KcpOutput>) -> KcpResult<()> {
    let start = Instant::now();
    let mut recv_buf = [0u8; 4096];
    
    loop {
        let current = start.elapsed().as_millis() as u32;
        
        // Check when next update is needed
        let next_update = kcp.check(current);
        
        // Wait until next update or incoming data
        let wait_time = next_update.saturating_sub(current);
        std::thread::sleep(Duration::from_millis(wait_time as u64));
        
        // Process incoming UDP packets
        // ... receive from socket and call kcp.input() ...
        
        // Update KCP state
        kcp.update(current)?;
        
        // Receive decoded data
        while let Ok(size) = kcp.recv(&mut recv_buf) {
            process_message(&recv_buf[..size]);
        }
    }
}

Async Integration (with tokio)

use tokio::net::UdpSocket;
use tokio::time::{interval, Duration};

async fn run_kcp_async(
    kcp: &mut Kcp<impl KcpOutput>,
    socket: &UdpSocket,
) -> KcpResult<()> {
    let mut buf = [0u8; 2048];
    let mut recv_buf = [0u8; 4096];
    let mut ticker = interval(Duration::from_millis(10));
    let start = std::time::Instant::now();
    
    loop {
        tokio::select! {
            // Periodic update
            _ = ticker.tick() => {
                let current = start.elapsed().as_millis() as u32;
                kcp.update(current)?;
            }
            
            // Incoming UDP packet
            result = socket.recv(&mut buf) => {
                if let Ok(size) = result {
                    kcp.input(&buf[..size])?;
                }
            }
        }
        
        // Check for received messages
        while let Ok(size) = kcp.recv(&mut recv_buf) {
            handle_message(&recv_buf[..size]).await;
        }
    }
}

Architecture

Module Structure

kcp-rs/
├── src/
│   ├── lib.rs          # Public API exports
│   ├── constants.rs    # Protocol constants (compile-time)
│   ├── sequence.rs     # Sequence number arithmetic
│   ├── segment.rs      # Segment structures
│   ├── ring_buffer.rs  # Lock-free ring buffers
│   ├── codec.rs        # Wire format encoding/decoding
│   ├── time.rs         # Time utilities
│   ├── kcp.rs          # Main KCP implementation
│   └── fec/            # Forward Error Correction (feature-gated)
│       ├── mod.rs      # FEC configuration and headers
│       ├── gf256.rs    # GF(2^8) finite field arithmetic
│       ├── encoder.rs  # Reed-Solomon encoder
│       ├── decoder.rs  # Reed-Solomon decoder
│       └── buffer.rs   # FEC send/receive buffers
├── benches/
│   └── throughput.rs   # Performance benchmarks
├── tests/
│   └── integration.rs  # End-to-end tests
├── docs/
│   └── performance_design.md  # Architecture documentation
└── examples/
    └── echo.rs         # Echo server example

Design Principles

1. Zero Allocation in Hot Path

All buffers are preallocated at initialization:

pub struct Kcp<O: KcpOutput> {
    // Preallocated segment buffers
    snd_buf: SendBuffer,    // [SendSegment; 256]
    rcv_buf: RecvBuffer,    // [RecvSegment; 256]
    ack_list: AckList,      // [AckEntry; 128]
    
    // Preallocated output buffer
    output_buf: [u8; MTU_DEFAULT + HEADER_SIZE],
    // ...
}

2. Safe Sequence Arithmetic

Uses wrapping arithmetic and half-range comparison:

// ✅ CORRECT - Handles wraparound
let is_newer = seq_a.wrapping_sub(seq_b) < HALF_RANGE;

// ❌ FORBIDDEN - Breaks at wraparound
let is_newer = seq_a > seq_b;

3. Power-of-Two Ring Buffers

All buffer capacities are powers of two for efficient indexing:

// ✅ CORRECT - Bitwise AND (single instruction)
let index = sequence & (CAPACITY - 1);

// ❌ FORBIDDEN - Modulo (expensive division)
let index = sequence % CAPACITY;

4. Cache-Optimized Structures

Hot fields are grouped within 64-byte cache lines:

#[repr(C)]
pub struct SendSegment {
    // Hot fields first (20 bytes)
    pub sn: Sequence,       // 4 bytes
    pub resend_ts: u32,     // 4 bytes
    pub rto: u32,           // 4 bytes
    pub fastack: u16,       // 2 bytes
    pub xmit: u16,          // 2 bytes
    pub state: SegmentState,// 1 byte
    pub frg: u8,            // 1 byte
    pub data_len: u16,      // 2 bytes
    
    // Cold fields after
    pub ts: u32,
    pub data_offset: u32,
}

Forward Error Correction (FEC)

The fec feature enables Reed-Solomon error correction, allowing recovery from packet loss without retransmission. This is useful for high-loss networks or latency-critical applications.

FEC Configuration

use kcp_rs::fec::{FecConfig, FecEncoder, FecDecoder};

// Balanced: k=4 data, m=2 parity (50% overhead, recover 2 losses)
let balanced = FecConfig::balanced();

// Low latency: k=2 data, m=1 parity (50% overhead, recover 1 loss)  
let low_latency = FecConfig::low_latency();

// High protection: k=8 data, m=4 parity (50% overhead, recover 4 losses)
let high_protection = FecConfig::high_protection();

// Bandwidth efficient: k=10 data, m=2 parity (20% overhead)
let efficient = FecConfig::bandwidth_efficient();

FEC Group Structure

Group: [D0] [D1] [D2] [D3] [P0] [P1]
        └─── data shards ───┘ └─ parity ─┘
             k = 4              m = 2

Any k shards can reconstruct all k data shards.

FEC Presets

Preset k (data) m (parity) Overhead Recovery
low_latency 2 1 50% 1 loss
balanced 4 2 50% 2 losses
high_protection 8 4 50% 4 losses
bandwidth_efficient 10 2 20% 2 losses

Design Characteristics

  • Zero allocation: All FEC buffers preallocated at init
  • Inline GF(2^8): Lookup table arithmetic for encoding/decoding
  • Vandermonde matrix: Systematic encoding for efficient recovery

API Reference

Core Types

Kcp<O: KcpOutput>

Main KCP control block.

impl<O: KcpOutput> Kcp<O> {
    /// Create new instance with default config
    pub fn new(conv: u32, output: O) -> Self;
    
    /// Create with custom configuration
    pub fn with_config(conv: u32, output: O, config: KcpConfig) -> Self;
    
    /// Send data (may fragment into multiple segments)
    pub fn send(&mut self, data: &[u8]) -> KcpResult<usize>;
    
    /// Receive reassembled data
    pub fn recv(&mut self, buf: &mut [u8]) -> KcpResult<usize>;
    
    /// Process incoming packet
    pub fn input(&mut self, data: &[u8]) -> KcpResult<()>;
    
    /// Update KCP state (call periodically)
    pub fn update(&mut self, current_ms: u32) -> KcpResult<()>;
    
    /// Check when next update is needed
    pub fn check(&self, current_ms: u32) -> u32;
    
    /// Get number of packets waiting to be sent
    pub fn wait_snd(&self) -> u32;
    
    /// Check if connection is dead
    pub fn is_dead(&self) -> bool;
    
    /// Get current RTT estimate (ms)
    pub fn rtt(&self) -> u32;
    
    /// Get current RTO (ms)
    pub fn rto(&self) -> u32;
}

KcpOutput Trait

Implement this for your transport layer:

pub trait KcpOutput {
    fn output(&mut self, data: &[u8]) -> KcpResult<usize>;
}

KcpConfig

Configuration options:

pub struct KcpConfig {
    pub mtu: u32,       // Maximum transmission unit (default: 1400)
    pub interval: u32,  // Update interval in ms (default: 100)
    pub nodelay: bool,  // No-delay mode (default: false)
    pub resend: u32,    // Fast resend trigger (default: 0)
    pub nc: bool,       // No congestion control (default: false)
    pub snd_wnd: u16,   // Send window size (default: 32)
    pub rcv_wnd: u16,   // Receive window size (default: 128)
    pub stream: bool,   // Stream mode (default: false)
}

KcpError

Error types:

pub enum KcpError {
    BufferTooSmall,  // Output buffer too small
    BufferFull,      // Send/receive buffer full
    InvalidPacket,   // Malformed packet
    ConvMismatch,    // Conversation ID mismatch
    DataTooLarge,    // Data exceeds maximum size
    WouldBlock,      // No data available (non-blocking)
    DeadLink,        // Connection timeout
}

Protocol Details

Packet Format

 0               1               2               3
 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       conv (4 bytes)                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     cmd       |     frg       |           wnd                 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        ts (4 bytes)                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        sn (4 bytes)                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       una (4 bytes)                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       len (4 bytes)                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       data (variable)                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Field Size Description
conv 4 Conversation ID
cmd 1 Command: PUSH(81), ACK(82), WASK(83), WINS(84)
frg 1 Fragment index (0 = last fragment)
wnd 2 Receive window size
ts 4 Timestamp
sn 4 Sequence number
una 4 Unacknowledged sequence number
len 4 Data length
data var Payload

Modes

Normal Mode (Default)

  • Congestion control enabled
  • Standard RTO backoff (RTO × 2)
  • Suitable for bulk transfers

Fast Mode

KcpConfig::fast()
  • No congestion control
  • Reduced RTO backoff (RTO × 1.5)
  • Fast retransmit after 2 duplicate ACKs
  • 20ms update interval
  • Optimized for real-time applications

Stream Mode

KcpConfig { stream: true, ..Default::default() }
  • No message boundaries
  • Data treated as byte stream
  • Lower overhead for small messages

Benchmarking

Run benchmarks:

cargo bench

Expected results on modern hardware (x86_64):

send/64_bytes           time:   [2.6 ns 2.7 ns 2.7 ns]
                        thrpt:  [22.1 GiB/s 22.4 GiB/s 22.7 GiB/s]
send/256_bytes          time:   [2.5 ns 2.5 ns 2.6 ns]
                        thrpt:  [93.2 GiB/s 93.8 GiB/s 94.3 GiB/s]
send/1024_bytes         time:   [2.5 ns 2.5 ns 2.5 ns]
                        thrpt:  [375.5 GiB/s 378.0 GiB/s 380.1 GiB/s]
send/4096_bytes         time:   [2.6 ns 2.6 ns 2.6 ns]
                        thrpt:  [1446.8 GiB/s 1455.7 GiB/s 1463.1 GiB/s]

update/empty            time:   [46.6 ns 46.7 ns 46.8 ns]
update/with_data        time:   [45.5 ns 45.5 ns 45.5 ns]

Note: Run with --features small-buffers to avoid stack overflow in tests.

Comparison with Original KCP

Feature Original KCP (C) KCP-RS
Language C Rust
Memory Safety Manual Guaranteed
Allocation Dynamic Preallocated
Thread Safety External sync Single-writer
no_std Support No Yes
Sequence Safety Manual Type-enforced

Use Cases

  • Online Gaming: Low-latency game state synchronization
  • Video Streaming: Real-time video with retransmission
  • VoIP: Voice communication over unreliable networks
  • IoT: Embedded devices with constrained resources
  • VPN: Reliable tunneling over UDP

Troubleshooting

High Latency

  1. Reduce interval (e.g., 10-20ms)
  2. Enable nodelay mode
  3. Increase window sizes (snd_wnd, rcv_wnd)
  4. Disable congestion control (nc: true) if network allows

Packet Loss

  1. Reduce resend threshold for faster retransmit
  2. Decrease MTU to avoid fragmentation
  3. Check network path MTU

Memory Usage

Buffer sizes are fixed at compile time:

// In constants.rs (default)
pub const SND_BUF_CAPACITY: usize = 256;
pub const RCV_BUF_CAPACITY: usize = 256;

// With small-buffers feature
pub const SND_BUF_CAPACITY: usize = 32;
pub const RCV_BUF_CAPACITY: usize = 32;
Configuration Memory per Kcp Instance
Default ~705 KB
small-buffers ~88 KB

⚠️ Stack Overflow Warning: Default buffers are large. Use Box::new(Kcp::new(...)) to allocate on heap, or use small-buffers feature for testing.

Contributing

Contributions are welcome! Please ensure:

  1. No heap allocation in hot path
  2. All buffer sizes power of two
  3. No modulo operations on indices
  4. Sequence comparisons use wrapping arithmetic
  5. Benchmarks show no regression

See Performance Design for detailed design principles.

# Run tests (use small-buffers to avoid stack overflow)
cargo test --features small-buffers

# Run integration tests
cargo test --test integration --features small-buffers

# Run benchmarks
cargo bench

# Check no_std compatibility
cargo build --no-default-features

# Run clippy
cargo clippy

License

BSD 3-Clause License. See LICENSE for details.

Acknowledgments

  • Original KCP by skywind3000
  • Design principles from high-frequency trading systems
  • Rust community for excellent tooling

About

KCP - A Fast and Reliable ARQ Protocol

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages