A safe Rust API for RDMA over InfiniBand, RoCE, and iWARP, wrapping libibverbs.
RDMA "verbs" let userspace talk to the network adapter directly: no system calls on the data path, no copies, single-digit-microsecond latencies. The C API leaves you to uphold a long list of lifetime, aliasing, and transport rules by hand. This crate encodes those rules in Rust types: queue pairs are typed by their transport, so posting a datagram without an address handle, or setting an RC-only timeout on a UD queue pair, is a compile error. The low-level control stays: every wrapper hands out its raw handle for verbs the safe API does not cover.
use ibverbs::{AccessFlags, RecvRequest};
fn main() -> ibverbs::Result<()> {
let ctx = ibverbs::devices()?.iter().next().expect("no device").open()?;
let cq = ctx.create_cq(16).build()?;
let pd = ctx.alloc_pd()?;
// A reliable-connection (RC) queue pair on port 1. On RoCE, routing needs a GID; pick the
// index of a suitable entry from `ctx.gid_table()?`.
let prepared = pd
.create_qp::<ibverbs::Rc>(&cq, &cq, 1)?
.set_gid_index(1)
.build()?;
// Exchange endpoints with the peer out of band (`endpoint.to_bytes()` is the wire
// format), or use the `rdmacm` feature to negotiate connections over IP instead.
// Here we self-connect for brevity.
let endpoint = prepared.endpoint()?;
let mut qp = prepared.handshake(endpoint)?;
let mut recv = pd.allocate(4096, AccessFlags::PERMISSIVE)?;
let mut send = pd.allocate(4096, AccessFlags::PERMISSIVE)?;
send.bytes_mut()[..5].copy_from_slice(b"hello");
unsafe { qp.post_recv([RecvRequest::new(1, &[recv.slice(..)])]) }?;
let mut batch = qp.start_send();
batch.op().signaled().send(2, &[send.slice(..5)]);
unsafe { batch.submit() }?;
let mut pending = 2;
while pending > 0 {
if let Some(mut completions) = cq.poll()? {
while let Some(wc) = completions.next() {
wc.ok().expect("work request failed");
pending -= 1;
}
}
}
assert_eq!(&recv.bytes_mut()[..5], b"hello");
Ok(())
}Complete programs live in ibverbs/examples/: a loopback transfer, an
event-driven loop multiplexing queues over one completion channel, an ibv_devinfo-style device
dump, doorbell batching, connection setup through the RDMA connection manager, and EFA SRD queue
pairs.
- Reliable and unreliable connections (RC/UC) and unreliable datagrams (UD), typed at compile
time: builder knobs, activation, and postable operations exist only on the transports they
apply to, and a datagram send is addressed (
.to(&ah, qpn, qkey)) by construction. - Device listing and typed device, port, and GID-table queries, including extended attributes and GID-to-netdev resolution.
- One-call connection bring-up (
handshake,activate) with typed timeout/retry values, or validated manual state transitions (modify/query) when you want to drive the state machine yourself. - Memory regions that own their buffer, plus registration of caller-managed memory
(
register_from_rawfor mmap/hugepages,register_dmabuffor device memory such as GPU buffers) andibv_advise_mr. - Two-sided send/receive, one-sided RDMA read and write (with immediate), and atomics, all
posted as doorbell batches: many work requests, one doorbell, with per-operation
signaled/fenced/solicitedmodifiers, inline data, and scatter/gather lists. - Shared receive queues, with receives posted in batches on queue pairs and SRQs alike.
- Completion handling on the extended interface: lazy-read polling, hardware completion
timestamps, and event-driven waiting through completion channels that plug into
epoll/tokio— including many queues multiplexed onto one file descriptor — plus device-level asynchronous events (port changes, queue errors, SRQ limits). - The RDMA connection manager (
rdmacmfeature): blocking helpers with timeouts and in-bandprivate_dataexchange for the common case, and a low-level, non-blockingCmIdAPI for event loops. - AWS Elastic Fabric Adapter SRD queue pairs (
efafeature).
Everything else stays reachable through as_raw on every wrapper and the raw bindings
re-exported as ibverbs::ffi.
- Resources are reference-counted internally; a queue pair keeps its completion queues and protection domain alive, so handles cannot dangle and there are no lifetime parameters to thread through your types.
- Transport rules are enforced by the type system and pinned by
compile_failtests; so are the posting rules — a doorbell batch borrows the queue pair until submitted, and polled completions are lent, so stale reads don't compile. - Buffers registered via
allocateare owned by theMemoryRegionand cannot be freed or moved while registered. Posting isunsafewith a precisely documented contract (the device may still be reading or writing the buffer), rather than pretending a safe signature could uphold it. - Errors are a
thiserrorenum naming the failing verb; queue-pair state transitions diagnose exactly which attribute-mask bits were wrong, and RoCE routing failures say what was wrong with the route.
None are enabled by default.
rdmacm: the RDMA connection manager. Linkslibrdmacm.efa: SRD queue pairs on AWS Elastic Fabric Adapter. Linkslibefa.
This crate dynamically links libibverbs, which is part of
rdma-core (the package is libibverbs1, with
libibverbs-dev for linking, on Debian and Ubuntu; rdma-core on Arch; rdma-core-devel on
Fedora), plus librdmacm and libefa when the corresponding features are enabled.
At build time, bindings are generated from a vendored rdma-core checkout, built automatically
by the ibverbs-sys crate (this needs cmake and a C toolchain, but no RDMA packages). To use
pre-built rdma-core headers instead, set RDMA_CORE_INCLUDE_DIR and RDMA_CORE_LIB_DIR. You do
not need to depend on ibverbs-sys directly: it is re-exported as ibverbs::ffi.
The minimum supported Rust version is 1.82.
The crate drives completion queues and queue pairs exclusively through the extended verbs
(ibv_create_cq_ex, ibv_create_qp_ex, and the ibv_wr_* send API). Providers that implement
them include mlx5, hns, efa, and rxe; on providers that do not (for example
mlx4-generation hardware), creation fails cleanly (with Error::Unsupported) rather than
degrading to the legacy verbs.
Much of the documentation of this crate borrows heavily from the excellent posts over at RDMAmojo. If you are going to be working a lot with ibverbs, chances are you will want to head over there. In particular, this overview post may be a good place to start.
For more information on RDMA verbs in general, see the InfiniBand Architecture
Specification
vol. 1, especially chapter 11, the RDMA Consortium's RDMA Protocol Verbs
Specification, the upstream
libibverbs/verbs.h
definitions, the manpages for the ibv_* functions, and the upstream C
examples.
Any modern Linux kernel can attach a software RDMA device (SoftRoCE) to an ordinary network interface:
$ sudo rdma link add rxe0 type rxe netdev <netdev>The examples (except the EFA one, which needs EFA hardware) and the integration test suite run against it unchanged, and CI does exactly this on every pull request: the data-path tests run against a SoftRoCE device and assert on the transferred bytes. A few tests cover paths the CI runner's rxe module mishandles (atomics, UC/UD, inline sends) and are skipped there; they pass on real hardware and current kernels.