Skip to content

Commit 31a43a4

Browse files
committed
add NvmExpressDxe readme
1 parent 34ee600 commit 31a43a4

1 file changed

Lines changed: 326 additions & 0 deletions

File tree

Lines changed: 326 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,326 @@
1+
# NvmExpressDxe
2+
3+
## Overview
4+
5+
NvmExpressDxe is a UEFI driver that manages NVM Express (NVMe) non-volatile memory subsystems
6+
connected over PCI. It follows the UEFI Driver Model and the NVM Express specification to
7+
discover, initialize, and provide block-level access to NVMe storage devices during the UEFI
8+
boot phase.
9+
10+
## Module Details
11+
12+
| Field | Value |
13+
|---|---|
14+
| **Module Type** | UEFI_DRIVER |
15+
| **INF GUID** | `5BE3BDF4-53CF-46a3-A6A9-73C34A6E5EE3` |
16+
| **Entry Point** | `NvmExpressDriverEntry` |
17+
| **Unload** | `NvmExpressUnload` |
18+
| **Architectures** | IA32, X64, EBC |
19+
20+
## What This Module Does
21+
22+
### Driver Binding
23+
24+
The driver implements the standard UEFI Driver Binding Protocol (`Supported`, `Start`, `Stop`)
25+
to attach to PCI devices with class code Mass Storage / NVM (0x01/0x08) and NVMHCI programming
26+
interface (0x02).
27+
28+
### Controller Initialization
29+
30+
During `Start`, the driver:
31+
32+
1. Opens the `EFI_PCI_IO_PROTOCOL` on the controller handle.
33+
2. Reads the NVMe Controller Capabilities register (`CAP`) and validates NVM command set support.
34+
3. Allocates DMA-accessible buffers for admin submission/completion queues.
35+
4. Disables the controller, programs the Admin Queue Attributes (`AQA`), Admin Submission Queue
36+
Base Address (`ASQ`), and Admin Completion Queue Base Address (`ACQ`), then re-enables the
37+
controller.
38+
5. Sends Identify Controller to retrieve controller metadata (serial number, model, capabilities).
39+
6. Uses the Set Features command (Number of Queues) to negotiate I/O queue pairs with the
40+
controller.
41+
7. Allocates DMA-accessible buffers for I/O submission/completion queues and creates the I/O
42+
queue pairs via Create I/O Completion Queue and Create I/O Submission Queue admin commands.
43+
8. Enumerates NVMe namespaces and creates child handles for each discovered namespace.
44+
45+
### Protocols Produced (per controller)
46+
47+
| Protocol | Purpose |
48+
|---|---|
49+
| `EFI_NVM_EXPRESS_PASS_THRU_PROTOCOL` | Raw NVMe command passthrough for admin and I/O commands. Installed on the controller handle. |
50+
| `EFI_DRIVER_SUPPORTED_EFI_VERSION_PROTOCOL` | Declares the EFI specification version the driver supports. Installed on the driver image handle at entry point. |
51+
52+
### Protocols Produced (per namespace)
53+
54+
| Protocol | Purpose |
55+
|---|---|
56+
| `EFI_BLOCK_IO_PROTOCOL` | Synchronous block read/write/flush/reset operations. |
57+
| `EFI_BLOCK_IO2_PROTOCOL` | Asynchronous (non-blocking) block I/O operations. Only installed when the controller allocates more than one I/O queue pair. |
58+
| `EFI_DISK_INFO_PROTOCOL` | Exposes NVMe Identify Namespace data for disk information queries. |
59+
| `EFI_STORAGE_SECURITY_COMMAND_PROTOCOL` | Security Send/Receive commands (if the controller supports OACS bit 0). |
60+
| `MEDIA_SANITIZE_PROTOCOL` | Media Clear, Purge, and Format operations mapped to NVMe Format NVM and Sanitize admin commands per NIST SP 800-88 guidelines. |
61+
62+
### Protocols Consumed
63+
64+
| Protocol | Purpose |
65+
|---|---|
66+
| `EFI_PCI_IO_PROTOCOL` | PCI BAR memory access, DMA buffer allocation, and bus master mapping. |
67+
| `EFI_DEVICE_PATH_PROTOCOL` | Device path construction for namespace child handles. |
68+
| `EFI_RESET_NOTIFICATION_PROTOCOL` | Registers a shutdown callback to gracefully shut down all NVMe controllers before platform reset. |
69+
70+
### Asynchronous I/O
71+
72+
The driver uses a periodic timer event (`NVME_HC_ASYNC_TIMER`, 1 ms) to poll the async I/O
73+
completion queue and process completed asynchronous requests. The `BlockIo2` protocol is only
74+
installed when the controller has allocated at least two I/O queue pairs (one for blocking, one
75+
for async).
76+
77+
### Controller Reset
78+
79+
On command timeout, the driver performs a full controller reset: disable, re-program admin queues,
80+
re-enable, re-identify, re-negotiate queue count, and re-create I/O queues—all while preserving
81+
allocated DMA buffers.
82+
83+
### Shutdown Notification
84+
85+
The driver registers with `EFI_RESET_NOTIFICATION_PROTOCOL` to issue NVMe shutdown notifications
86+
(CC.SHN) to all managed controllers before a platform reset, ensuring data integrity.
87+
88+
## Source Files
89+
90+
| File | Description |
91+
|---|---|
92+
| `NvmExpress.c` | Driver entry point, driver binding, namespace enumeration, queue cleanup. |
93+
| `NvmExpress.h` | Main header: data structures, constants, macros, function declarations. |
94+
| `NvmExpressHci.c` | HCI register access, controller init/reset, admin and I/O queue creation. |
95+
| `NvmExpressHci.h` | HCI function declarations. |
96+
| `NvmExpressPassthru.c` | NVM Express PassThru protocol implementation (blocking and async). |
97+
| `NvmExpressBlockIo.c` | BlockIo and BlockIo2 protocol implementations. |
98+
| `NvmExpressBlockIo.h` | BlockIo/BlockIo2 function declarations. |
99+
| `NvmExpressDiskInfo.c` | DiskInfo protocol implementation. |
100+
| `NvmExpressDiskInfo.h` | DiskInfo function declarations. |
101+
| `NvmExpressMediaSanitize.c` | Media Sanitize protocol: Clear, Purge, Format via NVMe commands. |
102+
| `NvmExpressMediaSanitize.h` | Media Sanitize function declarations and types. |
103+
| `ComponentName.c` | Component Name and Component Name2 protocol implementations. |
104+
| `UnitTest/MediaSanitizeUnitTest.c` | Host-based unit tests for the Media Sanitize functionality. |
105+
106+
---
107+
108+
## MU_CHANGE Summary
109+
110+
This section documents all Microsoft (Project Mu) changes made to the upstream EDK2 NvmExpressDxe
111+
driver. Each change is tagged in the source with `// MU_CHANGE` comments.
112+
113+
### 1. Allocate IO Queue Buffer
114+
115+
**Tag:** `MU_CHANGE - Allocate IO Queue Buffer`
116+
117+
**Files:** `NvmExpress.h`, `NvmExpress.c`, `NvmExpressHci.h`, `NvmExpressHci.c`, `NvmExpressBlockIo.c`, `NvmExpressPassthru.c`
118+
119+
**What changed:**
120+
121+
The upstream driver allocates a single flat 6-page DMA buffer at `DriverBindingStart` time and
122+
carves fixed 4 KiB regions out of it for all six queues (admin SQ, admin CQ, I/O SQ #1, I/O CQ #1,
123+
I/O SQ #2, I/O CQ #2). This MU change replaces that approach with a split allocation model:
124+
125+
- **Admin queues** are allocated separately based on the actual admin queue entry size and count
126+
derived from controller capabilities (via `NVME_SQ_SIZE_IN_PAGES` / `NVME_CQ_SIZE_IN_PAGES`
127+
macros).
128+
- **I/O queues** are allocated in a separate DMA buffer (`IoQueueBuffer` / `IoQueueBufferPciAddr`)
129+
whose size is computed dynamically based on the negotiated number of I/O queue pairs and their
130+
entry sizes.
131+
- A new structure `NVME_QUEUE_SIZE_DATA` (with `NumberOfEntries` and `EntrySize` fields) is added
132+
to track per-queue sizing metadata in the controller private data.
133+
- Queue buffer page counts are computed using macros (`NVME_SQ_SIZE_IN_PAGES`,
134+
`NVME_CQ_SIZE_IN_PAGES`) that use the actual entry size (as a power of 2) rather than assuming
135+
fixed 4 KiB pages.
136+
- New functions are introduced:
137+
- `NvmeControllerInitAdminQueues()` — initializes admin queue buffer pointers and programs ASQ/ACQ.
138+
- `NvmeControllerInitIoQueues()` — initializes I/O queue buffer pointers and creates I/O CQ/SQ.
139+
- `NvmExpressDriverCleanUpQueues()` — unmaps and frees both admin and I/O queue DMA buffers.
140+
- `NvmeControllerReset()` — performs a full controller reset reusing existing buffer allocations.
141+
- `ReadNvmeAdminQueueAttributes()` — reads the AQA register for validation during reset.
142+
- The `NvmeEnableController()` function now accepts `IoSqEs` and `IoCqEs` parameters to program
143+
the correct queue entry sizes into `CC.IOSQES` and `CC.IOCQES`.
144+
- Page mask operations on queue base addresses are removed since buffers allocated via
145+
`AllocatePages` are already page-aligned.
146+
147+
**Why it's needed:**
148+
149+
The fixed 6-page allocation is insufficient when queue sizes exceed 1 entry (i.e., when using
150+
alternative queue sizes like 255 entries). The dynamic allocation supports variable queue entry
151+
counts and entry sizes, allows the admin and I/O queues to be managed independently, and enables
152+
proper cleanup and reset without leaking DMA memory. Separating the I/O queue buffer also means
153+
the driver can scale the allocation based on how many queue pairs the controller actually grants.
154+
155+
---
156+
157+
### 2. Request Number of Queues from Controller
158+
159+
**Tag:** `MU_CHANGE - Request Number of Queues from Controller`
160+
161+
**Files:** `NvmExpress.h`, `NvmExpress.c`, `NvmExpressHci.c`
162+
163+
**What changed:**
164+
165+
- A new function `NvmeSetFeaturesNumberOfQueues()` is added. It sends the NVMe Set Features
166+
command (Feature ID: Number of Queues) to request the desired number of I/O queue pairs from
167+
the controller. The controller may allocate fewer pairs than requested; the driver stores the
168+
actual granted count in `Private->NumberOfIoQueuePairs`.
169+
- The maximum number of queue pairs the driver requests is defined by `NVME_MAX_QUEUES` (3 total:
170+
1 admin + 2 I/O), meaning the driver requests up to 2 I/O queue pairs.
171+
- The `NVME_SUPPORT_BLOCKIO2()` macro checks whether the controller allocated more than 1 I/O
172+
queue pair. If not, the `BlockIo2` protocol and async timer event are **not** installed.
173+
- `NvmeCreateIoCompletionQueue()` and `NvmeCreateIoSubmissionQueue()` loop from index 1 to
174+
`NumberOfIoQueuePairs` instead of using hardcoded indices.
175+
- `BlockIo2` protocol installation/uninstallation in `EnumerateNvmeDevNamespace()` and
176+
`UnregisterNvmeNamespace()` is made conditional on `NVME_SUPPORT_BLOCKIO2()`.
177+
178+
**Why it's needed:**
179+
180+
The upstream driver assumes every controller supports exactly two I/O queue pairs. Some NVMe
181+
controllers (especially embedded or resource-constrained ones) may only support a single I/O
182+
queue pair. By querying the controller via Set Features and gracefully degrading (skipping
183+
`BlockIo2` when only one queue pair is available), the driver avoids failures on controllers that
184+
cannot satisfy a two-queue-pair request and correctly reflects the controller's actual capabilities.
185+
186+
---
187+
188+
### 3. Support Alternative Hardware Queue Sizes in NVME Driver
189+
190+
**Tag:** `MU_CHANGE - Support alternative hardware queue sizes in NVME driver`
191+
192+
**Files:** `NvmExpress.h`, `NvmExpress.c`, `NvmExpressDxe.inf`, `NvmExpressHci.c`, `NvmExpressPassthru.c`
193+
194+
**What changed:**
195+
196+
- A PCD `PcdSupportAlternativeQueueSize` (Boolean) is consumed. When `TRUE`, the driver uses a
197+
maximum queue size of 255 entries (`NVME_ALTERNATIVE_MAX_QUEUE_SIZE`) instead of the default
198+
sizes (1 for sync, 63/255 for async).
199+
- Queue creation (`NvmeCreateIoCompletionQueue`, `NvmeCreateIoSubmissionQueue`) uses
200+
`MIN(NVME_ALTERNATIVE_MAX_QUEUE_SIZE, Cap.Mqes)` for all queues when the PCD is enabled.
201+
- Admin queue sizes (`AQA.ASQS`, `AQA.ACQS`) also use the alternative size when the PCD is set.
202+
- Passthrough command handling (`NvmExpressPassThru`) adjusts queue head/tail pointer arithmetic
203+
to use modular wrap-around (instead of XOR toggle) when the alternative queue size is active,
204+
supporting queue depths greater than 1.
205+
- The async task list processor (`ProcessAsyncTaskList`) similarly uses the alternative queue size
206+
for completion queue head management.
207+
208+
**Why it's needed:**
209+
210+
Some NVMe hardware implementations require a minimum queue depth greater than 1 (e.g., 255
211+
entries). The upstream driver defaults to queue sizes of 1 for synchronous I/O and uses XOR-based
212+
head/tail toggling which only works with 2-entry queues (0 and 1). When hardware requires deeper
213+
queues, this feature enables proper modular arithmetic for queue management and allocates
214+
appropriately sized buffers. This is controlled by a PCD so platforms that don't need it retain the
215+
original behavior.
216+
217+
---
218+
219+
### 4. NVMe Namespace Filtering
220+
221+
**Tag:** `MU_CHANGE - NVMe namespace filtering`
222+
223+
**Files:** `NvmExpress.h`, `NvmExpress.c`, `NvmExpressDxe.inf`
224+
225+
**What changed:**
226+
227+
- A PCD `PcdNvmeNamespaceFilter` (Boolean) is consumed. When `TRUE`, namespace discovery is
228+
limited to only the first namespace (NSID 1).
229+
- A constant `NVME_FIRST_NSID` (0x00000001) is defined.
230+
- `DiscoverAllNamespaces()` takes a new `FilteringEnabled` parameter. When set, the loop breaks
231+
after enumerating the first namespace instead of iterating through all namespaces.
232+
- In the `RemainingDevicePath` path, if filtering is enabled and the requested `NamespaceId` is
233+
not `NVME_FIRST_NSID`, the namespace is skipped.
234+
235+
**Why it's needed:**
236+
237+
In some platform or test configurations, it is desirable to restrict which NVMe namespaces are
238+
exposed to the UEFI environment. For example, a system with a multi-namespace NVMe device may
239+
only want the boot namespace (typically NSID 1) available during boot to reduce enumeration time,
240+
limit attack surface, or avoid exposing non-boot partitions. This PCD-controlled filter provides
241+
that capability without modifying the driver at build time.
242+
243+
---
244+
245+
### 5. Use the Mqes Value from the Cap Register
246+
247+
**Tag:** `MU_CHANGE - Use the Mqes value from the Cap register`
248+
249+
**Files:** `NvmExpressHci.c`
250+
251+
**What changed:**
252+
253+
- When creating I/O completion and submission queues, the queue size is clamped to
254+
`MIN(requested_size, Cap.Mqes)` instead of using the requested size directly.
255+
256+
**Why it's needed:**
257+
258+
The NVMe `CAP.MQES` field reports the maximum queue entries supported by the controller. If the
259+
driver requests a queue larger than what the controller supports, the behavior is undefined or the
260+
command fails. By clamping queue sizes to `MQES`, the driver respects the controller's hardware
261+
limits and avoids creating oversized queues.
262+
263+
---
264+
265+
### 6. Correct Cap Parameter Modifier
266+
267+
**Tag:** `MU_CHANGE - Correct Cap parameter modifier`
268+
269+
**Files:** `NvmExpressHci.c`
270+
271+
**What changed:**
272+
273+
- The `ReadNvmeControllerCapabilities()` function signature is corrected so that the `Cap`
274+
parameter uses the `OUT` modifier instead of `IN`, reflecting that the function writes to (not
275+
reads from) this parameter.
276+
277+
**Why it's needed:**
278+
279+
A correctness fix. The `Cap` parameter is an output of `ReadNvmeControllerCapabilities()`—the
280+
function reads the hardware register and writes the result into the caller's buffer. Marking it
281+
`IN` was semantically incorrect and could mislead static analysis tools or code reviewers.
282+
283+
---
284+
285+
### 7. Improve NVMe Controller Init Robustness
286+
287+
**Tag:** `MU_CHANGE - Improve NVMe controller init robustness`
288+
289+
**Files:** `NvmExpressHci.c`
290+
291+
**What changed:**
292+
293+
- At the start of `NvmeControllerInit()` (and `NvmeControllerReset()`), the driver reads the PCI
294+
Vendor ID and Device ID. If either returns `0xFFFF` (`NVME_INVALID_VID_DID`), the function
295+
returns `EFI_DEVICE_ERROR` immediately.
296+
- The assertion on `Cap.Mpsmin` is replaced with a conditional check and `EFI_DEVICE_ERROR` return,
297+
so the driver fails gracefully instead of asserting if the controller reports an unsupported
298+
minimum page size.
299+
300+
**Why it's needed:**
301+
302+
If an NVMe controller has been surprise-removed (hot-unplug), is behind a failed PCIe link, or
303+
is otherwise inaccessible, PCI config reads return all-ones (`0xFFFF`). Without this check, the
304+
driver would proceed to access invalid MMIO space, potentially causing system hangs or crashes.
305+
The `Mpsmin` check change prevents a hard assert in production builds if the controller reports an
306+
unexpected minimum memory page size, instead returning a clean error.
307+
308+
---
309+
310+
### 8. Remove Page Mask
311+
312+
**Tag:** `MU_CHANGE - Remove Page Mask` / `MU_CHANGE - Remove the page mask since the buffer is allocated using AllocatePages`
313+
314+
**Files:** `NvmExpressHci.c`
315+
316+
**What changed:**
317+
318+
- The page-alignment mask operations (`& ~(EFI_PAGE_SIZE - 1)`) on the admin submission queue
319+
(ASQ) and admin completion queue (ACQ) base addresses are removed.
320+
321+
**Why it's needed:**
322+
323+
Since the admin queue buffers are allocated using `PciIo->AllocateBuffer()` with
324+
`AllocateAnyPages`, the returned addresses are already guaranteed to be page-aligned. Applying a
325+
page mask is redundant and was removed for clarity. This is part of the broader Allocate IO Queue
326+
Buffer refactoring.

0 commit comments

Comments
 (0)