Skip to content

Commit 52399f5

Browse files
authored
TensorRT 10.16 OSS Release (#4729)
Signed-off-by: Kevin Chen <kevinch@nvidia.com>
1 parent aa76a58 commit 52399f5

186 files changed

Lines changed: 4689 additions & 2021 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.clang-format

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,30 @@ AlwaysBreakTemplateDeclarations: true
1919
BasedOnStyle: None
2020
BinPackArguments: true
2121
BinPackParameters: true
22+
# Almost the same as Allman style, but explicitly disabling BeforeLambdaBody
23+
# for backwards compatibility with clang-format-10 Allman style.
24+
# See also https://reviews.llvm.org/D44609
25+
BreakBeforeBraces: Custom
26+
BraceWrapping:
27+
AfterCaseLabel: true
28+
AfterClass: true
29+
AfterControlStatement: Always
30+
AfterEnum: true
31+
AfterFunction: true
32+
AfterNamespace: true
33+
AfterObjCDeclaration: true
34+
AfterStruct: true
35+
AfterUnion: true
36+
AfterExternBlock: true
37+
BeforeCatch: true
38+
BeforeElse: true
39+
BeforeLambdaBody: false
40+
BeforeWhile: false
41+
IndentBraces: false
42+
SplitEmptyFunction: true
43+
SplitEmptyRecord: true
44+
SplitEmptyNamespace: true
2245
BreakBeforeBinaryOperators: All
23-
BreakBeforeBraces: Allman
2446
BreakBeforeTernaryOperators: true
2547
BreakConstructorInitializersBeforeComma: true
2648
ColumnLimit: 120
@@ -61,6 +83,7 @@ PenaltyExcessCharacter: 1000000
6183
PenaltyReturnTypeOnItsOwnLine: 60
6284
PointerAlignment: Left
6385
PointerBindsToType: false
86+
QualifierAlignment: Right
6487
ReflowComments: true
6588
SortIncludes: true
6689
SpaceAfterCStyleCast: true
@@ -77,4 +100,3 @@ Standard: Cpp11
77100
StatementMacros: [API_ENTRY_TRY,TRT_TRY]
78101
TabWidth: 4
79102
UseTab: Never
80-
...

CHANGELOG.md

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,15 @@
11
# TensorRT OSS Release Changelog
2+
## 10.16 GA - 2026-3-24
3+
4+
- General
5+
- Default CUDA version updated to CUDA 13.2.
6+
7+
- Samples
8+
- Added sampleDistCollective sample to showcase multi-device execution in TensorRT.
9+
10+
- Parsers
11+
- Added kADJUST_FOR_DLA flag to adjust parsing behavior for ONNX models to be more amenable for DLA hardware execution.
12+
- Added DistCollective operator support for multi-device execution in TensorRT.
213

314
## 10.15 GA - 2026-2-2
415

@@ -21,11 +32,8 @@
2132
- Improved error reporting for models with multiple subgraphs, such as `Loop` or `Scan` nodes.
2233

2334
- Demo changes
24-
- demoDiffusion:
25-
- Stable Diffusion 1.5, 2.0 and 2.1 pipelines have been deprecated and removed.
26-
- Added support for Wan2.2-T2V-A14B Text to Video pipeline
27-
28-
35+
- demoDiffusion: Stable Diffusion 1.5, 2.0 and 2.1 pipelines have been deprecated and removed.
36+
- Added support for Wan2.2-T2V-A14B Text to Video pipeline
2937

3038
## 10.14 GA - 2025-11-7
3139
- Sample changes

CMakeLists.txt

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ endif()
6767
set(CMAKE_SKIP_BUILD_RPATH True)
6868

6969
# CUDA targets
70-
set(DEFAULT_CUDA_VERSION 13.1.0)
70+
set(DEFAULT_CUDA_VERSION 13.2)
7171
set_ifndef(CUDA_VERSION ${DEFAULT_CUDA_VERSION})
7272
message(STATUS "CUDA version set to ${CUDA_VERSION}")
7373

@@ -204,7 +204,6 @@ if(BUILD_SAFE_SAMPLES OR TRT_SAFETY_INFERENCE_ONLY)
204204
target_link_options(TRTSAFE::nvinfer_safe_shared INTERFACE LINKER:--unresolved-symbols=ignore-in-shared-libs)
205205
target_link_options(TRTSAFE::nvinfer_safe_debug INTERFACE LINKER:--unresolved-symbols=ignore-in-shared-libs)
206206
endif()
207-
208207
# Enable unified builder safety features when building safety samples or in inference-only mode.
209208
add_compile_definitions(ENABLE_UNIFIED_BUILDER=1)
210209
endif()
@@ -252,7 +251,7 @@ if(TRT_SAFETY_INFERENCE_ONLY)
252251
endif()
253252

254253
# C++17
255-
set(CMAKE_CXX_STANDARD 17)
254+
set(CMAKE_CXX_STANDARD 20)
256255
set(CMAKE_CXX_STANDARD_REQUIRED ON)
257256
set(CMAKE_CXX_EXTENSIONS OFF)
258257

README.md

Lines changed: 30 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -43,14 +43,14 @@ To build the TensorRT-OSS components, you will first need the following software
4343

4444
**TensorRT GA build**
4545

46-
- TensorRT v10.15.1.29
46+
- TensorRT v10.16.0.72
4747
- Available from direct download links listed below
4848

4949
**System Packages**
5050

5151
- [CUDA](https://developer.nvidia.com/cuda-toolkit)
5252
- Recommended versions:
53-
- cuda-13.1.0
53+
- cuda-13.2.0
5454
- cuda-12.9.0
5555
- [CUDNN (optional)](https://developer.nvidia.com/cudnn)
5656
- cuDNN 8.9
@@ -63,6 +63,7 @@ To build the TensorRT-OSS components, you will first need the following software
6363

6464
**Optional Packages**
6565

66+
- [NCCL](https://developer.nvidia.com/nccl/nccl-download) >= v2.19, < v3.0 — only when building with multi-device support (`-DTRT_BUILD_ENABLE_MULTIDEVICE=ON`) for the `sampleDistCollective` sample.
6667
- Containerized build
6768
- [Docker](https://docs.docker.com/install/) >= 19.03
6869
- [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-docker)
@@ -97,24 +98,24 @@ To build the TensorRT-OSS components, you will first need the following software
9798

9899
Else download and extract the TensorRT GA build from [NVIDIA Developer Zone](https://developer.nvidia.com) with the direct links below:
99100

100-
- [TensorRT 10.15.1.29 for CUDA 13.1, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.15.1/tars/TensorRT-10.15.1.29.Linux.x86_64-gnu.cuda-13.1.tar.gz)
101-
- [TensorRT 10.15.1.29 for CUDA 12.9, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.15.1/tars/TensorRT-10.15.1.29.Linux.x86_64-gnu.cuda-12.9.tar.gz)
102-
- [TensorRT 10.15.1.29 for CUDA 13.1, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.15.1/zip/TensorRT-10.15.1.29.Windows.win10.cuda-13.1.zip)
103-
- [TensorRT 10.15.1.29 for CUDA 12.9, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.15.1/zip/TensorRT-10.15.1.29.Windows.win10.cuda-12.9.zip)
101+
- [TensorRT 10.16.0.72 for CUDA 13.2, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.16.0/tars/TensorRT-10.16.0.72.Linux.x86_64-gnu.cuda-13.2.tar.gz)
102+
- [TensorRT 10.16.0.72 for CUDA 12.9, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.16.0/tars/TensorRT-10.16.0.72.Linux.x86_64-gnu.cuda-12.9.tar.gz)
103+
- [TensorRT 10.16.0.72 for CUDA 13.2, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.16.0/zip/TensorRT-10.16.0.72.Windows.win10.cuda-13.2.zip)
104+
- [TensorRT 10.16.0.72 for CUDA 12.9, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.16.0/zip/TensorRT-10.16.0.72.Windows.win10.cuda-12.9.zip)
104105

105-
**Example: Ubuntu 22.04 on x86-64 with cuda-13.1**
106+
**Example: Ubuntu 22.04 on x86-64 with cuda-13.2**
106107

107108
```bash
108109
cd ~/Downloads
109-
tar -xvzf TensorRT-10.15.1.29.Linux.x86_64-gnu.cuda-13.1.tar.gz
110-
export TRT_LIBPATH=`pwd`/TensorRT-10.15.1.29/lib
110+
tar -xvzf TensorRT-10.16.0.72.Linux.x86_64-gnu.cuda-13.2.tar.gz
111+
export TRT_LIBPATH=`pwd`/TensorRT-10.16.0.72/lib
111112
```
112113

113114
**Example: Windows on x86-64 with cuda-12.9**
114115

115116
```powershell
116-
Expand-Archive -Path TensorRT-10.15.1.29.Windows.win10.cuda-12.9.zip
117-
$env:TRT_LIBPATH="$pwd\TensorRT-10.15.1.29\lib"
117+
Expand-Archive -Path TensorRT-10.16.0.72.Windows.win10.cuda-12.9.zip
118+
$env:TRT_LIBPATH="$pwd\TensorRT-10.16.0.72\lib"
118119
```
119120

120121
## Setting Up The Build Environment
@@ -123,34 +124,34 @@ For Linux platforms, we recommend that you generate a docker container for build
123124

124125
1. #### Generate the TensorRT-OSS build container.
125126

126-
**Example: Ubuntu 24.04 on x86-64 with cuda-13.1 (default)**
127+
**Example: Ubuntu 24.04 on x86-64 with cuda-13.2 (default)**
127128

128129
```bash
129-
./docker/build.sh --file docker/ubuntu-24.04.Dockerfile --tag tensorrt-ubuntu24.04-cuda13.1
130+
./docker/build.sh --file docker/ubuntu-24.04.Dockerfile --tag tensorrt-ubuntu24.04-cuda13.2
130131
```
131132

132-
**Example: Rockylinux8 on x86-64 with cuda-13.1**
133+
**Example: Rockylinux8 on x86-64 with cuda-13.2**
133134

134135
```bash
135-
./docker/build.sh --file docker/rockylinux8.Dockerfile --tag tensorrt-rockylinux8-cuda13.1
136+
./docker/build.sh --file docker/rockylinux8.Dockerfile --tag tensorrt-rockylinux8-cuda13.2
136137
```
137138

138-
**Example: Ubuntu 24.04 cross-compile for Jetson (aarch64) with cuda-13.1 (JetPack SDK)**
139+
**Example: Ubuntu 24.04 cross-compile for Jetson (aarch64) with cuda-13.2 (JetPack SDK)**
139140

140141
```bash
141-
./docker/build.sh --file docker/ubuntu-cross-aarch64.Dockerfile --tag tensorrt-jetpack-cuda13.1
142+
./docker/build.sh --file docker/ubuntu-cross-aarch64.Dockerfile --tag tensorrt-jetpack-cuda13.2
142143
```
143144

144-
**Example: Ubuntu 24.04 on aarch64 with cuda-13.1**
145+
**Example: Ubuntu 24.04 on aarch64 with cuda-13.2**
145146

146147
```bash
147-
./docker/build.sh --file docker/ubuntu-24.04-aarch64.Dockerfile --tag tensorrt-aarch64-ubuntu24.04-cuda13.1
148+
./docker/build.sh --file docker/ubuntu-24.04-aarch64.Dockerfile --tag tensorrt-aarch64-ubuntu24.04-cuda13.2
148149
```
149150

150151
2. #### Launch the TensorRT-OSS build container.
151152
**Example: Ubuntu 24.04 build container**
152153
```bash
153-
./docker/launch.sh --tag tensorrt-ubuntu24.04-cuda13.1 --gpus all
154+
./docker/launch.sh --tag tensorrt-ubuntu24.04-cuda13.2 --gpus all
154155
```
155156
> NOTE:
156157
> <br> 1. Use the `--tag` corresponding to build container generated in Step 1.
@@ -163,7 +164,7 @@ For Linux platforms, we recommend that you generate a docker container for build
163164

164165
- Generate Makefiles and build
165166

166-
**Example: Linux (x86-64) build with default cuda-13.1**
167+
**Example: Linux (x86-64) build with default cuda-13.2**
167168

168169
```bash
169170
cd $TRT_OSSPATH
@@ -172,7 +173,7 @@ For Linux platforms, we recommend that you generate a docker container for build
172173
make -j$(nproc)
173174
```
174175

175-
**Example: Linux (aarch64) build with default cuda-13.1**
176+
**Example: Linux (aarch64) build with default cuda-13.2**
176177

177178
```bash
178179
cd $TRT_OSSPATH
@@ -181,7 +182,7 @@ For Linux platforms, we recommend that you generate a docker container for build
181182
make -j$(nproc)
182183
```
183184

184-
**Example: Native build on Jetson Thor (aarch64) with cuda-13.1**
185+
**Example: Native build on Jetson Thor (aarch64) with cuda-13.2**
185186

186187
```bash
187188
cd $TRT_OSSPATH
@@ -192,7 +193,7 @@ For Linux platforms, we recommend that you generate a docker container for build
192193

193194
> NOTE: C compiler must be explicitly specified via CC= for native aarch64 builds of protobuf.
194195
195-
**Example: Ubuntu 24.04 Cross-Compile for Jetson Thor (aarch64) with cuda-13.1 (JetPack)**
196+
**Example: Ubuntu 24.04 Cross-Compile for Jetson Thor (aarch64) with cuda-13.2 (JetPack)**
196197

197198
```bash
198199
cd $TRT_OSSPATH
@@ -201,7 +202,7 @@ For Linux platforms, we recommend that you generate a docker container for build
201202
make -j$(nproc)
202203
```
203204

204-
**Example: Ubuntu 24.04 Cross-Compile for DriveOS (aarch64) with cuda-13.1**
205+
**Example: Ubuntu 24.04 Cross-Compile for DriveOS (aarch64) with cuda-13.2**
205206

206207
```bash
207208
cd $TRT_OSSPATH
@@ -210,7 +211,7 @@ For Linux platforms, we recommend that you generate a docker container for build
210211
make -j$(nproc)
211212
```
212213

213-
**Example: Native builds on Windows (x86) with cuda-13.1**
214+
**Example: Native builds on Windows (x86) with cuda-13.2**
214215

215216
```bash
216217
cd $TRT_OSSPATH
@@ -220,7 +221,7 @@ For Linux platforms, we recommend that you generate a docker container for build
220221
msbuild TensorRT.sln /property:Configuration=Release -m:$env:NUMBER_OF_PROCESSORS
221222
```
222223

223-
> NOTE: The default CUDA version used by CMake is 13.1. To override this, for example to 12.9, append `-DCUDA_VERSION=12.9` to the cmake command.
224+
> NOTE: The default CUDA version used by CMake is 13.2. To override this, for example to 12.9, append `-DCUDA_VERSION=12.9` to the cmake command.
224225
225226
- Required CMake build arguments are:
226227
- `TRT_LIB_DIR`: Path to the TensorRT installation directory containing libraries.
@@ -238,6 +239,7 @@ For Linux platforms, we recommend that you generate a docker container for build
238239
- `TRT_SAFETY_INFERENCE_ONLY`: Specify if only build the safety inference components, for example [`ON`] | `OFF`. If turned ON, all other components will be turned OFF except `BUILD_SAFE_SAMPLES`.
239240
- `GPU_ARCHS`: GPU (SM) architectures to target. By default we generate CUDA code for all major SMs. Specific SM versions can be specified here as a quoted space-separated list to reduce compilation time and binary size. Table of compute capabilities of NVIDIA GPUs can be found [here](https://developer.nvidia.com/cuda-gpus). Examples: - NVidia A100: `-DGPU_ARCHS="80"` - RTX 50 series: `-DGPU_ARCHS="120"` - Multiple SMs: `-DGPU_ARCHS="80 120"`
240241
- `TRT_PLATFORM_ID`: Bare-metal build (unlike containerized cross-compilation). Currently supported options: `x86_64` (default).
242+
- `TRT_BUILD_ENABLE_MULTIDEVICE`: Enable the multi-device sample (`sampleDistCollective`). Use `-DTRT_BUILD_ENABLE_MULTIDEVICE=ON` to build it; requires [NCCL](https://developer.nvidia.com/nccl/nccl-download) >= v2.19, < v3.0.
241243

242244
## Building TensorRT DriveOS Samples
243245

@@ -313,7 +315,7 @@ For Linux platforms, we recommend that you generate a docker container for build
313315
```bash
314316
cd $TRT_OSSPATH
315317
mkdir -p build && cd build
316-
export CUDA_VERSION=13.1
318+
export CUDA_VERSION=13.2
317319
export CUDA=cuda-$CUDA_VERSION
318320
export CUDA_ROOT=/usr/local/cuda-safe-$CUDA_VERSION
319321
export QNX_BASE=/drive/toolchains/qnx_toolchain # Set to your QNX toolchain installation path

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
10.15.1.29
1+
10.16.0.72

0 commit comments

Comments
 (0)