Skip to content

Implement shrink-in-place logic for LPM next-hop deletions#273

Open
taspelund wants to merge 4 commits into
mainfrom
trey/route-target-shrink-in-place
Open

Implement shrink-in-place logic for LPM next-hop deletions#273
taspelund wants to merge 4 commits into
mainfrom
trey/route-target-shrink-in-place

Conversation

@taspelund
Copy link
Copy Markdown
Contributor

This PR implements shrink-in-place logic for LPM next-hop (route_target) deletions (i.e. the replace_route_targets() codepath). Shrink-in-place logic avoids a fallible allocation of new next-hop entries, which can be observed when attempting to delete an ECMP next-hop when the table is genuinely full or too fragmented for a contiguous allocation to succeed.

The PR introduces test cases for next-hop deletion both when the table is full and when the table is fragmented.

Additionally, the PR refactors some of the error handling paths in replace_route_targets, adds a test harness for simulating ASIC operation failures, and introduces test cases for the error/fallback paths.

Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
@taspelund taspelund self-assigned this May 15, 2026
@taspelund taspelund added rust Pull requests that update rust code customer labels May 15, 2026
taspelund added 2 commits May 15, 2026 16:05
Make subset target removals succeed under arbitrary route_target
fragmentation.  The alloc-then-swap path fails whenever the FreeMap
can't satisfy alloc(new_n), including the common case where a
fragmented table has plenty of total free slots but no contiguous
span of size new_n.  The new shrink-in-place path compacts the
existing reservation rather than allocating a new one, so it never
needs to find contiguous free space.

`replace_route_targets` classifies the requested update via
`classify_update` -> `RouteTargetUpdate`:

  - ShrinkInPlace { removed }: every old element appears in new at
    least once and the count delta matches the removed-positions
    list.  Dispatched to shrink_in_place.
  - Alloc: anything else.  Dispatched to alloc_then_swap, which
    follows the alloc-write-swap-free pattern.  This case can still
    TableFull on a fragmented table.

`replace_route_targets` starts by calling `unhook_route`, which
removes the in-core entry for the subnet and deletes the on-chip
`route_index` in one step, returning ownership of the previous
`RouteEntry`.  Every downstream branch receives the owned old entry
and is responsible for either installing a new in-core entry on
success or restoring the old one on failure.

`shrink_in_place` operates on a reservation that's already been
unhooked.  It walks the removed indices in decreasing order, pulling
the current tail slot's contents down into each doomed position via
delete + add on the route_target table.  The slots are unreachable
from the dataplane (route_index is gone), so per-slot atomicity
isn't required.  After the compaction loop it adds a new route_index
pointing at `(base, new_n)` to bring the route back online with the
smaller group, then tears down the unreachable tail entries and
returns their slots to the FreeMap in a single bulk `free(base + new_n,
removed.len())` call (the released slots are by construction the
contiguous suffix of the old reservation).

The in-core `RouteEntry.targets` is rebuilt from the post-compaction
`live` mirror, not from the caller-supplied vec.  Compaction reorders
survivors by physical pull-tail-into-hole mechanics, so caller-
supplied order need not match what was just programmed on chip;
honoring the compacted layout keeps in-core and ASIC slot order
aligned, which the shrink path itself relies on when it builds
`live` from `old.targets` on the next call.  This trades the
(undocumented, unrelied-on) caller-order property for an internal
invariant.

On failure in any compaction or index-add step, `rollback_shrink`
restores every position that was successfully deleted -- tracked in
a `touched` vec populated as compaction progresses -- to its
`old.targets[i]` contents, then reinstalls the original route_index.
If rollback succeeds the owned `old` is re-inserted into the in-core
mirror; if rollback itself fails the in-core entry stays cleared so
the next control-plane update on this subnet rebuilds from scratch.
Per-slot failures are logged individually so a divergence postmortem
can name the slots involved.

Additional helpers added:
- `RouteData::freemap_mut(is_ipv4)` accessor.
- `write_one_target` / `delete_one_target_at` for per-family
  dispatch on route_target.
- `add_route_index_for` / `delete_route_index_for` for per-family
  dispatch on route_index.

Tests cover delta=1 single-target removal on full and fragmented
v4/v6 tables, multi-target (delta=3) subset removal on full and
fragmented v4/v6 tables, and assertions that the in-core entry's
tgt_ip set matches the requested survivors and that the FreeMap can
satisfy a fresh alloc of the just-freed slot count.  Two positional
tests pin the compacted-layout invariant: a permutation-shrink and a
shrink to a single non-zero-position survivor.  An assertion that
genuine grow-into-full-table operations still surface TableFull
guards the alloc path.

Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Follow-up to the shrink-in-place commit, addressing review feedback:

* Add a NoOp variant to RouteTargetUpdate so an identity replace (new
  target set equals old) is detected by the classifier and the
  dispatcher returns without unhooking.  Eliminates the brief LPM-miss
  window the pre-refactor code (and the first iteration of this branch)
  produced on idempotent reconciles.
* Bump compact-loop and commit-boundary failure logs in shrink_in_place
  from debug! to warn!: those failures are user-visible Err returns
  from the very path that exists to avoid TableFull, so they deserve
  more than trace-level visibility.  Rollback failures stay at error!.
* Reword the "drop new_targets" cue into a comment at the dispatcher
  call site and drop the parameter from shrink_in_place entirely —
  new_n is now derived from old.slots - removed.len().
* Note in classify_update that subset detection compares NextHops by
  structural equality, so duplicate hops would be conservatively
  classified as Alloc.  add_route_locked rejects exact duplicates
  upstream, so this is a fallback rather than a reachable case.
* Test additions: same_size_non_subset_replace and grow_succeeds cover
  the Alloc-path dispatch under same-length swap and successful growth
  respectively.  identity_replace_is_noop pins the new short-circuit
  via a recycle-bin probe (alloc(slots) returning != before.index
  proves the original reservation was never freed, since FreeMap::alloc
  checks recycle_bins before the freelist).
* Test factoring: collapse delete_targets_full_scenario and
  delete_targets_fragmented_scenario into a single fragmented:bool
  helper.  -30 lines of duplication.

Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
@taspelund taspelund force-pushed the trey/route-target-shrink-in-place branch from a945cc5 to 1f2c4b2 Compare May 15, 2026 22:08
…C backend

The route tests added in 6c398b1 ("dpd: add ecmp del tests for
full/fragmented table") build a `Switch` via the stub asic backend,
which loads `bfrt.json` on construction.  That artifact is produced
by `cargo xtask codegen` (P4 compilation) and is not built by CI s
`test.sh`, so every route test fails at runtime with
`P4Missing(no bf-rt file: target/proto/.../bfrt.json)`.

The deeper issue is that the route subsystem is being tested through
a Switch even though it does not exercise ASIC semantics — the tests
care about in-core `RouteData` state and `FreeMap` accounting,
neither of which depends on what the stub records on the asic side.
Coupling the tests to bfrt.json is a build-time accident.

This commit removes the coupling by introducing a `RouteTableOps`
trait that abstracts the four on-chip table operations the route
subsystem performs:

  - add_target / delete_target on the `route_target` table
  - add_index / delete_index on the `route_index` table

plus `log()` and `route_fwd_table_size()` so functions taking the
trait do not need a `&Switch` at all.  Production code uses
`impl RouteTableOps for Switch`, which dispatches to the live
`table::route_ipv{4,6}::*` functions — identical behavior to the
prior direct calls.

All internal route helpers (`replace_route_targets`, `unhook_route`,
`alloc_then_swap`, `shrink_in_place`, `restore_after_shrink_failure`,
`rollback_shrink`, `cleanup_route`, `finalize_route`, `add_route_locked`,
`delete_route_target_locked`, `delete_route_locked`) now take
`&dyn RouteTableOps` rather than `&Switch`.  The four free helpers
introduced in the parent commit (`write_one_target`, `delete_one_target_at`,
`add_route_index_for`, `delete_route_index_for`) become trait methods
and are removed.  Public API signatures are unchanged: the async
entry points still take `&Switch`, which coerces to `&dyn RouteTableOps`
at the internal call sites.

The route test module gains a `TestOps` that implements the trait,
returning `Ok` for every on-chip operation and `TEST_FREEMAP_SIZE`
for `route_fwd_table_size`.  `TestOps` also carries per-op failpoint
state (a `Cell<Option<u32>>` per op): tests call `ops.arm(op, after)`
to make the `(after+1)`-th subsequent call return a synthetic
`Switch(SdeError)`.  This replaces the thread-local failpoint module
the previous iteration of this PR used.

With `TestOps` in place, `make_switch` and `shrink_test_freemap`
are gone.  Every test now constructs a `(TestOps, RouteData)` via
`make_test_ctx()`, drops the `tokio::test`/`async` decoration (no
async lock is needed), and the bfrt.json runtime dependency is
gone with it.

The four rollback tests from the prior iteration land here using
`TestOps::arm` directly:

  * rollback_on_compact_delete_failure — empty `touched` set,
    simplest recovery: rollback only re-adds the original route_index.
  * rollback_on_compact_write_failure — load-bearing case for the
    `live[]` / `touched[]` distinction.  Delete succeeded then write
    failed, so `live[i] = None` and rollback must skip the delete
    and rewrite `original[i]`.
  * rollback_on_index_readd_failure — compact loop completes, the
    commit-boundary `add_index` fails.  Rollback restores every
    touched position and re-adds the original index (the second
    AddIndex call of the run).
  * rollback_failure_leaves_in_core_cleared — arms two failpoints
    to break both the primary delete and rollback s own `add_index`.
    Asserts the documented degraded posture: in-core entry absent,
    FreeMap reservation leaked.

Caveat: the harness injects SDE-shaped errors at the trait
boundary.  This exercises rollback sequencing and the in-core
mirror state machine, but it does not cover the class of failures
where the SDE returns Err yet leaves the ASIC in a partial state
the `live[]` mirror does not anticipate.  Catching that needs
hardware-in-the-loop testing; out of scope here.

Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
@taspelund taspelund force-pushed the trey/route-target-shrink-in-place branch from 1f2c4b2 to d5cbf7b Compare May 15, 2026 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

customer rust Pull requests that update rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant