Skip to content

Commit 866d7ba

Browse files
neginraoofprasanthpul
authored andcommitted
Custom op export documentation (#162)
* custom op docs * clean up references and files * formatting * fixes for the new API for custom domain * updated export APIs * Text edit * updated * update based on feedback * updated headers * readme fixes * Update CMakeLists.txt * Update custom_op.cc * Update custom_op.h * Update custom_op_test.cc * Update setup.py * readme updates * Update custom_op_test.cc * Update custom_group_norm.cpp
1 parent c76238b commit 866d7ba

9 files changed

Lines changed: 524 additions & 0 deletions

File tree

PyTorchCustomOperator/README.md

Lines changed: 236 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,236 @@
1+
## How to export Pytorch model with custom op to ONNX and run it in ONNX Runtime
2+
3+
This document describes the required steps for extending TorchScript with a custom operator, exporting the operator to ONNX format, and adding the operator to ONNX Runtime for model inference.
4+
Although the ```torch``` module provides a broad set of tensor operators, TorchScript enables users to design and implement their custom (C++ or CUDA) function and register it as a new operator.
5+
To export such a custom operator to ONNX format, the custom op registration ONNX API enables users to export a custom TorchScript operator using a combination of existing and/or new custom ONNX ops.
6+
Once the operator is converted to ONNX format, users can implement and register it with ONNX Runtime for model inference. This document explains the details of this process end-to-end, along with an example.
7+
8+
### Required Steps
9+
10+
- [1](#step1) - Adding the custom operator implementation in C++ and registering it with TorchScript
11+
- [2](#step2) - Exporting the custom Operator to ONNX, using:
12+
<br /> - a combination of existing ONNX ops
13+
<br /> or
14+
<br /> - a custom ONNX Operator
15+
- [3](#step3) - Adding the custom operator implementation and registering it in ONNX Runtime (required only if using a custom ONNX op in step 2)
16+
17+
<a name="step1"></a>
18+
### Implement the Custom Operator
19+
For this step, you need to have PyTorch installed on your system. Try installing PyTorch nightly build from [here](https://pytorch.org/get-started/locally/).
20+
If you have a custom operator that you need to register in TorchScript as a C++ extension, you need to implement the operator and build it with ```setuptools```.
21+
Start by implementing the operator. You can leverage ATen, PyTorch's high-performance C++ tensor library. Below we have the example C++ code for the group norm operator:
22+
23+
```cpp
24+
#include <torch/script.h>
25+
#include "Eigen/Dense"
26+
27+
template <typename T>
28+
using ConstEigenVectorArrayMap = Eigen::Map<const Eigen::Array<T, Eigen::Dynamic, 1>>;
29+
template <typename T>
30+
using EigenVectorArrayMap = Eigen::Map<Eigen::Array<T, Eigen::Dynamic, 1>>;
31+
32+
torch::Tensor custom_group_norm(torch::Tensor X, torch::Tensor num_groups, torch::Tensor scale, torch::Tensor bias, torch::Tensor eps) {
33+
34+
float* X_data = X.data<float>();
35+
float* scale_data = scale.data<float>();
36+
float* bias_data = bias.data<float>();
37+
int num_groups_i = int(num_groups.data<float>()[0]);
38+
float epsilon_ = eps.data<float>()[0];
39+
torch::Tensor output = torch::zeros(X.sizes());
40+
float* out = output.data<float>();
41+
const int64_t N = X.size(0);
42+
const int64_t C = X.size(1) / num_groups_i; // assume [N C*num_groups H W] per the spec
43+
44+
int64_t sample_size = 1;
45+
for (size_t i = 2; i < X.dim(); ++i) {
46+
sample_size *= X.size(i);
47+
}
48+
sample_size *= C;
49+
50+
std::vector<float> Xi;
51+
for (auto i = 0; i < N * num_groups_i; ++i) {
52+
ConstEigenVectorArrayMap<float> Xi(X_data + sample_size * i, sample_size);
53+
const float Xi_mean = Xi.mean();
54+
const float squared_norm = (Xi - Xi_mean).matrix().squaredNorm();
55+
const float inv_stdev = 1.0f / std::sqrt(squared_norm / sample_size + epsilon_);
56+
EigenVectorArrayMap<float> Yi(out + sample_size * i, sample_size);
57+
const float channel_scale = inv_stdev * scale_data[i % (C * num_groups_i)];
58+
const float channel_shift = bias_data[i % (C * num_groups_i)] - Xi_mean * channel_scale;
59+
Yi = Xi * channel_scale + channel_shift;
60+
}
61+
62+
return output;
63+
}
64+
```
65+
In this example, we use the [Eigen](https://eigen.tuxfamily.org/dox/index.html) library. To install this library, you need to download and extract Eigen header files. You can find this library [here](https://eigen.tuxfamily.org/dox/GettingStarted.html).
66+
<br />
67+
Next, you need to register this operator with TorchScript compiler using ```torch::RegisterOperator``` function in the same cpp file. The first argument is operator namespace and name separated by ```::```. The next argument is a reference to your function.
68+
69+
```cpp
70+
static auto registry = torch::RegisterOperators("mynamespace::custom_group_norm", &custom_group_norm);
71+
```
72+
73+
Once you have your C++ function, you can build it using ```setuptools.Extension```. Create a ```setup.py script``` in the same directory where you have your C++ code. ```CppExtension.BuildExtension``` takes care of the required compiler flags, such as required include paths and flags required during mixed C++/CUDA mixed compilation.
74+
75+
For this example, we only provide the forward pass function needed for inference. Similarly, you can implement the backward pass if needed.
76+
77+
```python
78+
from setuptools import setup, Extension
79+
from torch.utils import cpp_extension
80+
81+
setup(name='custom_group_norm',
82+
ext_modules=[cpp_extension.CppExtension('custom_group_norm', ['custom_group_norm.cpp'])],
83+
include_dirs = [<path_to_eigen_header_file>])],
84+
cmdclass={'build_ext': cpp_extension.BuildExtension})
85+
```
86+
87+
Make sure to include required header files in ```include_dirs``` list.
88+
89+
Now, running the command ```python setup.py install``` from your source directory, you can build and install your extension.
90+
The shared object should be generated under ```build``` directory.
91+
You can load it using:
92+
```torch.ops.load_library("<path_to_object_file>)```
93+
Then you can refer to your custom operator:
94+
```torch.ops.<namespace_name>.<operator_name>```
95+
96+
<a name="step2"></a>
97+
### Export the Operator to ONNX
98+
99+
You can export your custom operator using existing ONNX ops, or you can create custom ONNX ops to use.
100+
In both cases, you need to add the symbolic method to the exporter, and register your custom symbolic using ```torch.onnx.register_custom_op_symbolic```.
101+
The first argument contains the custom (TorchScript) namespace name and operator name, separated by ```::```. You also need to pass a reference to the custom symbolic method, and the ONNX opset version. Since the symbolic function could have a combination of ONNX and custom operators, providing the ONNX opset version is required at symbolic registration.
102+
Hence, you can only register a single version of your custom opset per each ONNX opset version.
103+
<br/>
104+
You can add your script in a python file under the source directory.
105+
```python
106+
def my_group_norm(g, input, num_groups, scale, bias, eps):
107+
return g.op("mydomain::mygroupnorm", input, num_groups, scale, bias, epsilon_f=eps)
108+
109+
from torch.onnx import register_custom_op_symbolic
110+
register_custom_op_symbolic('mynamespace::custom_group_norm', my_group_norm, 9)
111+
```
112+
113+
In the symbolic method, you need to implement the ONNX subgraph to use for exporting your custom op.
114+
An ONNX opset consists of a domain name and a version number. If you are using existing ONNX operators (from the default ONNX domain), you don't need to add the domain name prefix.
115+
In our example, we want to use an op from our custom opset. Therefore, we need to add the domain name as a prefix in the following format:
116+
```"<domain_name>::<onnx_op>"```
117+
118+
Now, You can create a ```torch.nn.module``` using your custom op, and export it to ONNX using ```torch.onnx.export```. Make sure to specify input and output names at export, as this will help you later when implementing the ONNX Runtime kernel for this operator.
119+
You can pass the custom opset versions in the custom_opsests dictionary when calling the export API. If not explicitly specified, custom opset version is set to 1 by default.
120+
```python
121+
import torch
122+
123+
def export_custom_op():
124+
class CustomModel(torch.nn.Module):
125+
def forward(self, x, num_groups, scale, bias):
126+
return torch.ops.mydomain.custom_group_norm(x, num_groups, scale, bias, torch.tensor([0.]))
127+
128+
X = torch.randn(3, 2, 1, 2)
129+
num_groups = torch.tensor([2.])
130+
scale = torch.tensor([2., 1.])
131+
bias = torch.tensor([1., 0.])
132+
inputs = (X, num_groups, scale, bias)
133+
134+
f = './model.onnx'
135+
torch.onnx.export(CustomModel(), inputs, f,
136+
opset_version=9,
137+
example_outputs=None,
138+
input_names=["X", "num_groups", "scale", "bias"], output_names=["Y"],
139+
custom_opsets={"mydomain": 2})
140+
```
141+
142+
To be able to use this custom ONNX operator for inference, we add our custom operator to an inference engine. If you are using existing ONNX ops only, you do not need to go through this last step.
143+
144+
<a name="step3"></a>
145+
### Implement the Operator in ONNX Runtime #
146+
147+
The last step is to implement this op in ONNX Runtime and build it. For this step, you need to have ONNX Runtime installed on your system. You can install ONNX Runtime v1.0.0 using:
148+
```
149+
pip install onnxruntime
150+
```
151+
or find the nuget package from [here](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime/).
152+
153+
We illustrate how to add a new operator using ONNX Runtime's custom operator C API (API's are experimental for now).
154+
First, you need to create a custom domain of type ```Ort::CustomOpDomain```. This domain name is the same name provided in the symbolic method (step 2) when exporting the model.
155+
156+
```cpp
157+
Ort::CustomOpDomain custom_op_domain("org.pytorch.mydomain");
158+
```
159+
Next, you need to create a ```ORT::CustomOp``` object, write its kernel implementation, and add it to your custom domain:
160+
161+
```cpp
162+
struct Input {
163+
const char* name;
164+
std::vector<int64_t> dims;
165+
std::vector<float> values;
166+
};
167+
168+
struct OrtTensorDimensions : std::vector<int64_t> {
169+
OrtTensorDimensions(Ort::CustomOpApi ort, const OrtValue* value) {
170+
OrtTensorTypeAndShapeInfo* info = ort.GetTensorTypeAndShape(value);
171+
std::vector<int64_t>::operator=(ort.GetTensorShape(info));
172+
ort.ReleaseTensorTypeAndShapeInfo(info);
173+
}
174+
};
175+
176+
template <typename T>
177+
struct GroupNormKernel {
178+
private:
179+
float epsilon_;
180+
Ort::CustomOpApi ort_;
181+
182+
public:
183+
GroupNormKernel(Ort::CustomOpApi ort, const OrtKernelInfo* info) : ort_(ort) {
184+
epsilon_ = ort_.KernelInfoGetAttribute<float>(info, "epsilon");
185+
}
186+
187+
void Compute(OrtKernelContext* context);
188+
};
189+
190+
191+
struct GroupNormCustomOp : Ort::CustomOpBase<GroupNormCustomOp, GroupNormKernel<float>> {
192+
void* CreateKernel(Ort::CustomOpApi api, const OrtKernelInfo* info) { return new GroupNormKernel<float>(api, info); };
193+
const char* GetName() const { return "testgroupnorm"; };
194+
195+
size_t GetInputTypeCount() const { return 4; };
196+
ONNXTensorElementDataType GetInputType(size_t /*index*/) const { return ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT; };
197+
198+
size_t GetOutputTypeCount() const { return 1; };
199+
ONNXTensorElementDataType GetOutputType(size_t /*index*/) const { return ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT; };
200+
};
201+
```
202+
203+
The Compute function is implemented [in the source file](https://github.com/neginraoof/CustomOperators/blob/master/CuctomOperator/ort_custom_op/custom_op.cc).
204+
Once you have the custom kernel and schema, you can add them to the domain using the C API as below:
205+
```cpp
206+
GroupNormCustomOp custom_op;
207+
custom_op_domain.Add(&custom_op);
208+
```
209+
210+
In the repository, you can find our example group norm implementation along with a sample ONNX Runtime unit test to verify the expected output.
211+
You can use cmake to build your custom operator with the required dependencies. Add a file named ```CMakeLists.txt``` under the same directory where you have your source files.
212+
213+
You can link the required libraries in your cmake file using ```target_link_libraries``` :
214+
```
215+
find_library(ONNXRUNTIME_LIBRARY onnxruntime HINTS <PATH_TO_YOUR_INSTALLATION_DIRECTORY>)
216+
target_link_libraries(customop PUBLIC ${ONNXRUNTIME_LIBRARY})
217+
```
218+
219+
And include the required headers using ```include_directories```
220+
```
221+
include_directories(<PATH_TO_EIGEN_HEADER_FILE>)
222+
```
223+
224+
An example ```CMakeLists.txt``` file we could be found [here](https://github.com/neginraoof/CustomOperators/blob/master/CuctomOperator/ort_custom_op/CMakeLists.txt).
225+
226+
Once you have the cmake file, create a build directory from the same location and try ```cd build```. Execute the command ```cmake ..``` to configure the project and build it using ```make``` command.
227+
228+
Now that you have registered your operator, you should be able to run your model and test it. You can find the source code and test for a sample custom operator [here](https://github.com/neginraoof/CustomOperators/blob/master/CuctomOperator/ort_custom_op/custom_op_test.cc).
229+
230+
231+
232+
### References:
233+
1- [Extending TorchScript with Custom C++ Operators](https://pytorch.org/tutorials/advanced/torch_script_custom_ops.html)
234+
235+
2- [ONNX Runtime: Adding a New Op](https://github.com/microsoft/onnxruntime/blob/master/docs/AddingCustomOp.md)
236+
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
cmake_minimum_required(VERSION 3.10)
2+
project (customop)
3+
add_definitions(-std=c++11)
4+
5+
6+
set(TEST_SOURCE custom_op_test.cc)
7+
set(HEADER custom_op.h)
8+
set(SOURCE custom_op.h)
9+
add_executable(customop ${SOURCE} ${HEADER} ${TEST_SOURCE})
10+
11+
#Include path to header files for Custom Op
12+
include_directories(<PATH_TO_EIGEN_DIR>)
13+
include_directories(<PATH_TO_ONNXRUNTIME_INCLUDE_DIR>)
14+
15+
#Include path to header files for Custom Op Test
16+
include_directories(<PATH_TO_ONNXRUNTIME_TEST_UTIL_INCLUDE_DIR>)
17+
18+
#Linking dependencies for Custom Op
19+
find_library(ONNXRUNTIME_LIBRARY onnxruntime HINTS <PATH_TO_ONNXRUNTIME_LIB>)
20+
target_link_libraries(customop PUBLIC ${ONNXRUNTIME_LIBRARY})
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
#include <iostream>
2+
#include "Eigen/Dense"
3+
#include "onnxruntime_cxx_api.h"
4+
5+
template <typename T>
6+
using ConstEigenVectorArrayMap = Eigen::Map<const Eigen::Array<T, Eigen::Dynamic, 1>>;
7+
template <typename T>
8+
using EigenVectorArrayMap = Eigen::Map<Eigen::Array<T, Eigen::Dynamic, 1>>;
9+
10+
template <typename T>
11+
void GroupNormKernel<T>::Compute(OrtKernelContext* context) {
12+
// Setup inputs
13+
const OrtValue* input_X = ort_.KernelContext_GetInput(context, 0);
14+
const T* X_data = reinterpret_cast<const T*>(ort_.GetTensorData<T>(input_X));
15+
const OrtValue* input_num_groups = ort_.KernelContext_GetInput(context, 1);
16+
const T* num_groups = reinterpret_cast<const T*>(ort_.GetTensorData<const T*>(input_num_groups));
17+
const OrtValue* input_scale = ort_.KernelContext_GetInput(context, 2);
18+
const T* scale_data = reinterpret_cast<const T*>(ort_.GetTensorData<T>(input_scale));
19+
const OrtValue* input_B = ort_.KernelContext_GetInput(context, 3);
20+
const T* B_data = reinterpret_cast<const T*>(ort_.GetTensorData<T>(input_B));
21+
22+
// Setup output
23+
OrtTensorDimensions dimensions(ort_, input_X);
24+
OrtValue* output = ort_.KernelContext_GetOutput(context, 0, dimensions.data(), dimensions.size());
25+
float* out = ort_.GetTensorMutableData<float>(output);
26+
const int64_t N = dimensions[0];
27+
const int64_t C = dimensions[1] / num_groups[0]; // assume [N C*num_groups H W] per the spec
28+
29+
OrtTensorTypeAndShapeInfo* output_info = ort_.GetTensorTypeAndShape(output);
30+
ort_.ReleaseTensorTypeAndShapeInfo(output_info);
31+
32+
// Do computation
33+
int64_t sample_size = 1;
34+
for (size_t i = 2; i < dimensions.size(); ++i) {
35+
sample_size *= dimensions[i];
36+
}
37+
sample_size *= C;
38+
39+
for (auto i = 0; i < N * num_groups[0]; ++i) {
40+
ConstEigenVectorArrayMap<float> Xi(X_data + sample_size * i, sample_size);
41+
const float Xi_mean = Xi.mean();
42+
const float squared_norm = (Xi - Xi_mean).matrix().squaredNorm();
43+
const float inv_stdev = 1.0f / std::sqrt(squared_norm / sample_size + epsilon_);
44+
EigenVectorArrayMap<float> Yi(out + sample_size * i, sample_size);
45+
const float channel_scale = inv_stdev * scale_data[i % (C * int(num_groups[0]))];
46+
const float channel_shift = B_data[i % (C * int(num_groups[0]))] - Xi_mean * channel_scale;
47+
Yi = Xi * channel_scale + channel_shift;
48+
}
49+
}
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
#include <iostream>
2+
#include "onnxruntime_cxx_api.h"
3+
4+
struct Input {
5+
const char* name;
6+
std::vector<int64_t> dims;
7+
std::vector<float> values;
8+
};
9+
10+
struct OrtTensorDimensions : std::vector<int64_t> {
11+
OrtTensorDimensions(Ort::CustomOpApi ort, const OrtValue* value) {
12+
OrtTensorTypeAndShapeInfo* info = ort.GetTensorTypeAndShape(value);
13+
std::vector<int64_t>::operator=(ort.GetTensorShape(info));
14+
ort.ReleaseTensorTypeAndShapeInfo(info);
15+
}
16+
};
17+
18+
template <typename T>
19+
struct GroupNormKernel {
20+
private:
21+
float epsilon_;
22+
Ort::CustomOpApi ort_;
23+
24+
public:
25+
GroupNormKernel(Ort::CustomOpApi ort, const OrtKernelInfo* info) : ort_(ort) {
26+
epsilon_ = ort_.KernelInfoGetAttribute<float>(info, "epsilon");
27+
}
28+
29+
void Compute(OrtKernelContext* context);
30+
};
31+
32+
33+
struct GroupNormCustomOp : Ort::CustomOpBase<GroupNormCustomOp, GroupNormKernel<float>> {
34+
void* CreateKernel(Ort::CustomOpApi api, const OrtKernelInfo* info) { return new GroupNormKernel<float>(api, info); };
35+
const char* GetName() const { return "testgroupnorm"; };
36+
37+
size_t GetInputTypeCount() const { return 4; };
38+
ONNXTensorElementDataType GetInputType(size_t /*index*/) const { return ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT; };
39+
40+
size_t GetOutputTypeCount() const { return 1; };
41+
ONNXTensorElementDataType GetOutputType(size_t /*index*/) const { return ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT; };
42+
};
43+
44+
#include "custom_op.cc"

0 commit comments

Comments
 (0)