Implement shrink-in-place logic for LPM next-hop deletions#273
Open
taspelund wants to merge 4 commits into
Open
Implement shrink-in-place logic for LPM next-hop deletions#273taspelund wants to merge 4 commits into
taspelund wants to merge 4 commits into
Conversation
Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Make subset target removals succeed under arbitrary route_target
fragmentation. The alloc-then-swap path fails whenever the FreeMap
can't satisfy alloc(new_n), including the common case where a
fragmented table has plenty of total free slots but no contiguous
span of size new_n. The new shrink-in-place path compacts the
existing reservation rather than allocating a new one, so it never
needs to find contiguous free space.
`replace_route_targets` classifies the requested update via
`classify_update` -> `RouteTargetUpdate`:
- ShrinkInPlace { removed }: every old element appears in new at
least once and the count delta matches the removed-positions
list. Dispatched to shrink_in_place.
- Alloc: anything else. Dispatched to alloc_then_swap, which
follows the alloc-write-swap-free pattern. This case can still
TableFull on a fragmented table.
`replace_route_targets` starts by calling `unhook_route`, which
removes the in-core entry for the subnet and deletes the on-chip
`route_index` in one step, returning ownership of the previous
`RouteEntry`. Every downstream branch receives the owned old entry
and is responsible for either installing a new in-core entry on
success or restoring the old one on failure.
`shrink_in_place` operates on a reservation that's already been
unhooked. It walks the removed indices in decreasing order, pulling
the current tail slot's contents down into each doomed position via
delete + add on the route_target table. The slots are unreachable
from the dataplane (route_index is gone), so per-slot atomicity
isn't required. After the compaction loop it adds a new route_index
pointing at `(base, new_n)` to bring the route back online with the
smaller group, then tears down the unreachable tail entries and
returns their slots to the FreeMap in a single bulk `free(base + new_n,
removed.len())` call (the released slots are by construction the
contiguous suffix of the old reservation).
The in-core `RouteEntry.targets` is rebuilt from the post-compaction
`live` mirror, not from the caller-supplied vec. Compaction reorders
survivors by physical pull-tail-into-hole mechanics, so caller-
supplied order need not match what was just programmed on chip;
honoring the compacted layout keeps in-core and ASIC slot order
aligned, which the shrink path itself relies on when it builds
`live` from `old.targets` on the next call. This trades the
(undocumented, unrelied-on) caller-order property for an internal
invariant.
On failure in any compaction or index-add step, `rollback_shrink`
restores every position that was successfully deleted -- tracked in
a `touched` vec populated as compaction progresses -- to its
`old.targets[i]` contents, then reinstalls the original route_index.
If rollback succeeds the owned `old` is re-inserted into the in-core
mirror; if rollback itself fails the in-core entry stays cleared so
the next control-plane update on this subnet rebuilds from scratch.
Per-slot failures are logged individually so a divergence postmortem
can name the slots involved.
Additional helpers added:
- `RouteData::freemap_mut(is_ipv4)` accessor.
- `write_one_target` / `delete_one_target_at` for per-family
dispatch on route_target.
- `add_route_index_for` / `delete_route_index_for` for per-family
dispatch on route_index.
Tests cover delta=1 single-target removal on full and fragmented
v4/v6 tables, multi-target (delta=3) subset removal on full and
fragmented v4/v6 tables, and assertions that the in-core entry's
tgt_ip set matches the requested survivors and that the FreeMap can
satisfy a fresh alloc of the just-freed slot count. Two positional
tests pin the compacted-layout invariant: a permutation-shrink and a
shrink to a single non-zero-position survivor. An assertion that
genuine grow-into-full-table operations still surface TableFull
guards the alloc path.
Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Follow-up to the shrink-in-place commit, addressing review feedback: * Add a NoOp variant to RouteTargetUpdate so an identity replace (new target set equals old) is detected by the classifier and the dispatcher returns without unhooking. Eliminates the brief LPM-miss window the pre-refactor code (and the first iteration of this branch) produced on idempotent reconciles. * Bump compact-loop and commit-boundary failure logs in shrink_in_place from debug! to warn!: those failures are user-visible Err returns from the very path that exists to avoid TableFull, so they deserve more than trace-level visibility. Rollback failures stay at error!. * Reword the "drop new_targets" cue into a comment at the dispatcher call site and drop the parameter from shrink_in_place entirely — new_n is now derived from old.slots - removed.len(). * Note in classify_update that subset detection compares NextHops by structural equality, so duplicate hops would be conservatively classified as Alloc. add_route_locked rejects exact duplicates upstream, so this is a fallback rather than a reachable case. * Test additions: same_size_non_subset_replace and grow_succeeds cover the Alloc-path dispatch under same-length swap and successful growth respectively. identity_replace_is_noop pins the new short-circuit via a recycle-bin probe (alloc(slots) returning != before.index proves the original reservation was never freed, since FreeMap::alloc checks recycle_bins before the freelist). * Test factoring: collapse delete_targets_full_scenario and delete_targets_fragmented_scenario into a single fragmented:bool helper. -30 lines of duplication. Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
a945cc5 to
1f2c4b2
Compare
…C backend The route tests added in 6c398b1 ("dpd: add ecmp del tests for full/fragmented table") build a `Switch` via the stub asic backend, which loads `bfrt.json` on construction. That artifact is produced by `cargo xtask codegen` (P4 compilation) and is not built by CI s `test.sh`, so every route test fails at runtime with `P4Missing(no bf-rt file: target/proto/.../bfrt.json)`. The deeper issue is that the route subsystem is being tested through a Switch even though it does not exercise ASIC semantics — the tests care about in-core `RouteData` state and `FreeMap` accounting, neither of which depends on what the stub records on the asic side. Coupling the tests to bfrt.json is a build-time accident. This commit removes the coupling by introducing a `RouteTableOps` trait that abstracts the four on-chip table operations the route subsystem performs: - add_target / delete_target on the `route_target` table - add_index / delete_index on the `route_index` table plus `log()` and `route_fwd_table_size()` so functions taking the trait do not need a `&Switch` at all. Production code uses `impl RouteTableOps for Switch`, which dispatches to the live `table::route_ipv{4,6}::*` functions — identical behavior to the prior direct calls. All internal route helpers (`replace_route_targets`, `unhook_route`, `alloc_then_swap`, `shrink_in_place`, `restore_after_shrink_failure`, `rollback_shrink`, `cleanup_route`, `finalize_route`, `add_route_locked`, `delete_route_target_locked`, `delete_route_locked`) now take `&dyn RouteTableOps` rather than `&Switch`. The four free helpers introduced in the parent commit (`write_one_target`, `delete_one_target_at`, `add_route_index_for`, `delete_route_index_for`) become trait methods and are removed. Public API signatures are unchanged: the async entry points still take `&Switch`, which coerces to `&dyn RouteTableOps` at the internal call sites. The route test module gains a `TestOps` that implements the trait, returning `Ok` for every on-chip operation and `TEST_FREEMAP_SIZE` for `route_fwd_table_size`. `TestOps` also carries per-op failpoint state (a `Cell<Option<u32>>` per op): tests call `ops.arm(op, after)` to make the `(after+1)`-th subsequent call return a synthetic `Switch(SdeError)`. This replaces the thread-local failpoint module the previous iteration of this PR used. With `TestOps` in place, `make_switch` and `shrink_test_freemap` are gone. Every test now constructs a `(TestOps, RouteData)` via `make_test_ctx()`, drops the `tokio::test`/`async` decoration (no async lock is needed), and the bfrt.json runtime dependency is gone with it. The four rollback tests from the prior iteration land here using `TestOps::arm` directly: * rollback_on_compact_delete_failure — empty `touched` set, simplest recovery: rollback only re-adds the original route_index. * rollback_on_compact_write_failure — load-bearing case for the `live[]` / `touched[]` distinction. Delete succeeded then write failed, so `live[i] = None` and rollback must skip the delete and rewrite `original[i]`. * rollback_on_index_readd_failure — compact loop completes, the commit-boundary `add_index` fails. Rollback restores every touched position and re-adds the original index (the second AddIndex call of the run). * rollback_failure_leaves_in_core_cleared — arms two failpoints to break both the primary delete and rollback s own `add_index`. Asserts the documented degraded posture: in-core entry absent, FreeMap reservation leaked. Caveat: the harness injects SDE-shaped errors at the trait boundary. This exercises rollback sequencing and the in-core mirror state machine, but it does not cover the class of failures where the SDE returns Err yet leaves the ASIC in a partial state the `live[]` mirror does not anticipate. Catching that needs hardware-in-the-loop testing; out of scope here. Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
1f2c4b2 to
d5cbf7b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements shrink-in-place logic for LPM next-hop (
route_target) deletions (i.e. thereplace_route_targets()codepath). Shrink-in-place logic avoids a fallible allocation of new next-hop entries, which can be observed when attempting to delete an ECMP next-hop when the table is genuinely full or too fragmented for a contiguous allocation to succeed.The PR introduces test cases for next-hop deletion both when the table is full and when the table is fragmented.
Additionally, the PR refactors some of the error handling paths in replace_route_targets, adds a test harness for simulating ASIC operation failures, and introduces test cases for the error/fallback paths.