Skip to content

Commit a0377f6

Browse files
nwf-msrnwf
authored andcommitted
Add docs/AddressSpace.md
1 parent 7940fee commit a0377f6

1 file changed

Lines changed: 157 additions & 0 deletions

File tree

docs/AddressSpace.md

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
# How snmalloc Manages Address Space
2+
3+
Like any modern, high-performance allocator, `snmalloc` contains multiple layers of allocation.
4+
We give here some notes on the internal orchestration.
5+
6+
## From platform to malloc
7+
8+
Consider a first, "small" allocation (typically less than a platform page); such allocations showcase more of the machinery.
9+
For simplicity, we assume that
10+
11+
- this is not an `OPEN_ENCLAVE` build,
12+
- the `BackendAllocator` has not been told to use a `fixed_range`,
13+
- this is not a `SNMALLOC_CHECK_CLIENT` build, and
14+
- (as a consequence of the above) `SNMALLOC_META_PROTECTED` is not `#define`-d.
15+
16+
Since this is the first allocation, all the internal caches will be empty, and so we will hit all the slow paths.
17+
For simplicity, we gloss over much of the "lazy initialization" that would actually be implied by a first allocation.
18+
19+
1. The `LocalAlloc::small_alloc` finds that it cannot satisfy the request because its `LocalCache` lacks a free list for this size class.
20+
The request is delegated, unchanged, to `CoreAllocator::small_alloc`.
21+
22+
2. The `CoreAllocator` has no active slab for this sizeclass, so `CoreAllocator::small_alloc_slow` delegates to `BackendAllocator::alloc_chunk`.
23+
At this point, the allocation request is enlarged to one or a few chunks (a small counting number multiple of `MIN_CHUNK_SIZE`, which is typically 16KiB); see `sizeclass_to_slab_size`.
24+
25+
3. `BackendAllocator::alloc_chunk` at this point splits the allocation request in two, allocating both the chunk's metadata structure (of size `PAGEMAP_METADATA_STRUCT_SIZE`) and the chunk itself (a multiple of `MIN_CHUNK_SIZE`).
26+
Because the two exercise similar bits of machinery, we now track them in parallel in prose despite their sequential nature.
27+
28+
4. The `BackendAllocator` has a chain of "range" types that it uses to manage address space.
29+
By default (and in the case we are considering), that chain begins with a per-thread "small buddy allocator range".
30+
31+
1. For the metadata allocation, the size is (well) below `MIN_CHUNK_SIZE` and so this allocator, which by supposition is empty, attempts to `refill` itself from its parent.
32+
This results in a request for a `MIN_CHUNK_SIZE` chunk from the parent allocator.
33+
34+
2. For the chunk allocation, the size is `MIN_CHUNK_SIZE` or larger, so this allocator immediately forwards the request to its parent.
35+
36+
5. The next range allocator in the chain is a per-thread *large* buddy allocator that refills in 2 MiB granules.
37+
(2 MiB chosen because it is a typical superpage size.)
38+
At this point, both requests are for at least one and no more than a few times `MIN_CHUNK_SIZE` bytes.
39+
40+
1. The first request will `refill` this empty allocator by making a request for 2 MiB to its parent.
41+
42+
2. The second request will stop here, as the allocator will no longer be empty.
43+
44+
6. The chain continues with a `CommitRange`, which simply forwards all allocation requests and (upon unwinding) ensures that the address space is mapped.
45+
46+
7. The chain now transitions from thread-local to global; the `GlobalRange` simply serves to acquire a lock around the rest of the chain.
47+
48+
8. The next entry in the chain is a `StatsRange` which serves to accumulate statistics.
49+
We ignore this stage and continue onwards.
50+
51+
9. The next entry in the chain is another *large* buddy allocator which refills at 16 MiB but can hold regions
52+
of any size up to the entire address space.
53+
The first request triggers a `refill`, continuing along the chain as a 16 MiB request.
54+
(Recall that the second allocation will be handled at an earlier point on the chain.)
55+
56+
10. The penultimate entry in the chain is a `PagemapRegisterRange`, which always forwards allocations along the chain.
57+
58+
11. At long last, we have arrived at the last entry in the chain, a `PalRange`.
59+
This delegates the actual allocation, of 16 MiB, to either the `reserve_aligned` or `reserve` method of the Platform Abstraction Layer (PAL).
60+
61+
12. Having wound the chain onto our stack, we now unwind!
62+
The `PagemapRegisterRange` ensures that the Pagemap entries for allocations passing through it are mapped and returns the allocation unaltered.
63+
64+
13. The global large buddy allocator splits the 16 MiB refill into 8, 4, and 2 MiB regions it retains as well as returning the remaining 2 MiB back along the chain.
65+
66+
14. The `StatsRange` makes its observations, the `GlobalRange` now unlocks the global component of the chain, and the `CommitRange` ensures that the allocation is mapped.
67+
Aside from these side effects, these propagate the allocation along the chain unaltered.
68+
69+
15. We now arrive back at the thread-local large buddy allocator, which takes its 2 MiB refill and breaks it down into powers of two down to the requested `MIN_CHUNK_SIZE`.
70+
The second allocation (of the chunk), will either return or again break down one of these intermediate chunks.
71+
72+
16. For the first (metadata) allocation, the thread-local *small* allocator breaks the `MIN_CHUNK_SIZE` allocation down into powers of two down to `PAGEMAP_METADATA_STRUCT_SIZE` and returns one of that size.
73+
The second allocation will have been forwarded and so is not additionally handled here.
74+
75+
Exciting, no?
76+
77+
## What Can I Learn from the Pagemap?
78+
79+
### Decoding a MetaEntry
80+
81+
The centerpiece of `snmalloc`'s metadata is its `PageMap`, which associates each "chunk" of the address space (~16KiB; see `MIN_CHUNK_BITS`) with a `MetaEntry`.
82+
A `MetaEntry` is a pair of pointers, suggestively named `meta` and `remote_and_sizeclass`.
83+
In more detail, `MetaEntry`s are better represented by Sigma and Pi types, all packed into two pointer-sized words in ways that preserve pointer provenance on CHERI.
84+
85+
To begin decoding, a bit (`REMOTE_BACKEND_MARKER`) in `remote_and_sizeclass` distinguishes chunks owned by frontend and backend allocators.
86+
87+
For chunks owned by the *frontend* (`REMOTE_BACKEND_MARKER` not asserted),
88+
89+
1. The `remote_and_sizeclass` field is a product of
90+
91+
1. A `RemoteAllocator*` indicating the `LocalAlloc` that owns the region of memory.
92+
93+
2. A "full sizeclass" value (itself a tagged sum type between large and small sizeclasses).
94+
95+
2. The `meta` pointer is a bit-stuffed pair of
96+
97+
1. A pointer to a larger metadata structure with type dependent on the role of this chunk
98+
99+
2. A bit (`META_BOUNDARY_BIT`) that serves to limit chunk coalescing on platforms where that may not be possible, such as CHERI.
100+
101+
See `src/backend/metatypes.h` and `src/mem/metaslab.h`.
102+
103+
For chunks owned by a *backend* (`REMOTE_BACKEND_MARKER` asserted), there are again multiple possibilities.
104+
105+
For chunks owned by a *small buddy allocator*, the remainder of the `MetaEntry` is zero.
106+
That is, it appears to have small sizeclass 0 and an implausible `RemoteAllocator*`.
107+
108+
For chunks owned by a *large buddy allocator*, the `MetaEntry` is instead a node in a red-black tree of all such chunks.
109+
Its contents can be decoded as follows:
110+
111+
1. The `meta` field's `META_BOUNDARY_BIT` is preserved, with the same meaning as in the frontend case, above.
112+
113+
2. `meta` (resp. `remote_and_sizeclass`) includes a pointer to the left (resp. right) *chunk* of address space.
114+
(The corresponding child *node* in this tree is found by taking the *address* of this chunk and looking up the `MetaEntry` in the Pagemap.
115+
This trick of pointing at the child's chunk rather than at the child `MetaEntry` is particularly useful on CHERI:
116+
it allows us to capture the authority to the chunk without needing another pointer and costs just a shift and add.)
117+
118+
3. The `meta` field's `LargeBuddyRep::RED_BIT` is used to carry the red/black color of this node.
119+
120+
See `src/backend/largebuddyrange.h`.
121+
122+
### Encoding a MetaEntry
123+
124+
We can also consider the process for generating a MetaEntry for a chunk of the address space given its state.
125+
The following cases apply:
126+
127+
1. The address is not associated with `snmalloc`:
128+
Here, the `MetaEntry`, if it is mapped, is all zeros and so it...
129+
* has `REMOTE_BACKEND_MARKER` clear in `remote_and_sizeclass`.
130+
* appears to be owned by a frontend RemoteAllocator at address 0 (probably, but not certainly, `nullptr`).
131+
* has "small" sizeclass 0, which has size 0.
132+
* has no associated metadata structure.
133+
134+
2. The address is part of a free chunk in a backend's Large Buddy Allocator:
135+
The `MetaEntry`...
136+
* has `REMOTE_BACKEND_MARKER` asserted in `remote_and_sizeclass`.
137+
* has "small" sizeclass 0, which has size 0.
138+
* the remainder of its `MetaEntry` structure will be a Large Buddy Allocator rbtree node.
139+
* has no associated metadata structure.
140+
141+
3. The address is part of a free chunk inside a backend's Small Buddy Allocator:
142+
Here, the `MetaEntry` is zero aside from the asserted `REMOTE_BACKEND_MARKER` bit, and so it...
143+
* has "small" sizeclass 0, which has size 0.
144+
* has no associated metadata structure.
145+
146+
4. The address is part of a live large allocation (spanning one or more 16KiB chunks):
147+
Here, the `MetaEntry`...
148+
* has `REMOTE_BACKEND_MARKER` clear in `remote_and_sizeclass`.
149+
* has a *large* sizeclass value.
150+
* has an associated `RemoteAllocator*` and `Metaslab*` metadata structure
151+
(holding just the original chunk pointer in its `MetaCommon` substructure;
152+
it is configured to always trigger the deallocation slow-path to skip the logic when a chunk is in use as a slab).
153+
154+
5. The address, whether or not it is presently within an allocated object, is part of an active slab. Here, the `MetaEntry`....
155+
* encodes the *small* sizeclass of all objects in the slab.
156+
* has a `RemoteAllocator*` referencing the owning `LocalAlloc`'s message queue.
157+
* points to the slab's `Metaslab` structure containing additional metadata (e.g., free list).

0 commit comments

Comments
 (0)