Deploy Semantic Segmentation Network (U-Net) with TensorRT (no upsampling support)

Tags:

I am trying to deploy a trained U-Net with TensorRT. The model was trained using Keras (with Tensorflow as backend). The code is very similar to this one: https://github.com/zhixuhao/unet/blob/master/model.py

When I converted the model to UFF format, using some code like this:

Click to copy

import uff
import os
uff_fname = os.path.join("./models/", "model_" + idx + ".uff")
uff_model = uff.from_tensorflow_frozen_model(
    frozen_file = os.path.join('./models', trt_fname), output_nodes = output_names, 
    output_filename = uff_fname
)

I will get the following warning:

Click to copy

Warning: No conversion function registered for layer: ResizeNearestNeighbor yet.
Converting up_sampling2d_32_12/ResizeNearestNeighbor as custom op: ResizeNearestNeighbor
Warning: No conversion function registered for layer: DataFormatVecPermute yet.
Converting up_sampling2d_32_12/Shape-0-0-VecPermuteNCHWToNHWC-LayoutOptimizer as custom op: DataFormatVecPermute

I tried to avoid this by replacing the upsampling layer with upsampling(bilinear interpolation) and transpose convolution. But the converter would throw me similar errors. I checked https://docs.nvidia.com/deeplearning/sdk/tensorrt-support-matrix/index.html and it seemed all these operations are not supported yet.

I am wondering if there is any workaround to this problem? Is there any other format/framework that TensorRT likes and has upsampling supported? Or is it possible to replace it with some other supported operations?

I also saw somewhere that one can add customized operations to replace those unsupported ones for TensorRT. Though I am not so sure how the workflow would be. It would also be really helpful if someone could point out an example of custom layers.

Thank you in advance!

866

asked Jul 17 '19 22:07

Yayuchen

2 Answers

The warnings are because these operations are not supported yet by TensorRT, as you already mentioned. Unfortunately there is no easy way to fix this. You either have to modify the graph (even after training) to use a combination supported operation only; or write these operation yourself as custom layer.

However, there is a better way to run inference on other devices in C++. You can use TensorFlow mixed with TensorRT together. TensorRT will analyze the graph for ops that it supports and convert them to TensorRT nodes, and the remaining of the graph will be handled by TensorFlow as usual. More information here. This solution is much faster than rewriting the operations yourself. The only complicated part is to build TensorFlow from sources on your target device and generating the dynamic library tensorflow_cc. Recently there are many guides and support for TensorFlow ports to various architectures e.g. ARM.

126

answered Sep 21 '22 14:09

Xenon

Update 09/28/2019

Nvidia released TensorRT 6.0.1 about two weeks ago and added a new API called "IResizeLayer". This layer supports "Nearest" interpolation and can thus be used to implement upsampling. No need to use custom layers/plugins any more!

Original answer:

thanks for all the answers and suggestions posted here!

In the end, we implemented the network in TensorRT C++ API directly and loaded the weights from the .h5 model file. We haven't got the time to profile and polish the solution yet, but the inference seems to be working according to the test images we fed in.

Here's the workflow we've adopted:

Step 1: Code the upsampling layer.

In our U-Net model, all the upsampling layer has a scaling factor of (2, 2) and they all use ResizeNearestNeighbor interpolation. Essentially, pixel value at (x,y) in the original tensor will go to four pixels: (2x, 2y), (2x+1, 2y), (2x, 2y+1) and (2x+1, 2y+1) in the new tensor. This can be easily coded up into a CUDA kernel function.

Once we got the upsampling kernel we need to wrap it with TensorRT API, specifically the IPluginV2Ext class. The developer reference has some descriptions of what functions need to be implemented. I'd say enqueue() is the most important function because the CUDA kernel gets executed there.

There are also examples in the TensorRT Samples folder. For my version, these resources are helpful:

Github: Leaky Relu as custom layer
TensorRT-5.1.2.2/samples/sampleUffSSD
TensorRT-5.1.2.2/samples/sampleSSD

Step 2: Code the rest of the network using TensorRT API

The rest of the network should be quite straightforward. Just find call different "addxxxLayer" function from TensorRT network definitions.

One thing to keep in mind: depending on which version of TRT you are using, the way to add padding can be different. I think the newest version (5.1.5) allows developers to add parameters in addConvolution() so that the proper padding mode can be selected.

My model was trained using Keras, the default padding mode is that the right and bottom get more padding if the total number of padding is not even. Check this Stack Overflow link for details. There's a mode in 5.1.5 that represents this padding scheme.

If you are on an older version (5.1.2.2), you will need to add the padding as a separate layer before the convolution layer, which has two parameters: pre-padding and post-padding.

Also, all things are NCHW in TensorRT

Helpful sample:

TensorRT-5.1.2.2/samples/sampleMNISTAP

Step 3: Load the weights

TensorRT wants weights in format [out_c, in_c, filter_h, filter_w], which is mentioned in an archived documentation. Keras has weights in format [filter_h, filter_w, c_in, c_out].

We got a pure weights file by calling model.save_weights('weight.h5') in Python. Then we can read the weights into a Numpy array using h5py, performed transposing and saved the transposed weights as a new file. We also figured out the Group and Dataset name using h5py. This info was used when loading weights into C++ code using HDF5 C++ API.

We compared the output layer by layer between C++ code and Python code. For our U-Net, all the activation maps are the same till maybe the third block (after 2 pooling). After that, there is a tiny difference between pixel values. The absolute percentage error is 10^-8 so we don't think it's that bad. We are still in the process of polishing the C++ implementation.

Again, thanks for all the suggestions and answers we got in this post. Hope our solution can be helpful as well!

answered Sep 21 '22 14:09

Yayuchen

Related questions
                            
                                How can I find the alpha shape (concave hull) of a 2d point cloud?
                            
                                send code from vim to an external application for execution
                            
                                Importing Netbeans keymap to Eclipse
                            
                                How do I simulate connection errors and request timeouts in python unit tests
                            
                                Python works in PyCharm but not from terminal
                            
                                Create a canonical "parent" product in Django Oscar programmatically
                            
                                Finding anonymous enums with libclang
                            
                                Django application 504 error after saving model
                            
                                UNABLE to load uWSGI plugin: ./python3_plugin.so: cannot open shared object file: No such file or directory
                            
                                Unittesting with Pyspark: unclosed socket warnings
                            
                                Keras model.fit() with tf.dataset API + validation_data
                            
                                Quantize a Keras neural network model
                            
                                Django model constraint for related objects
                            
                                Send keystrokes to non-active GUI application without occupying the keyboard
                            
                                pandas style options to latex
                            
                                How to define a new type (class) in Python using C API?
                            
                                How does django-nose differ from the default Django test-runner
                            
                                Django CSRF when backend and frontend are separated
                            
                                Blogging with IPython Notebook
                            
                                Python script use while loop to keep updating job scripts and multiprocess the tasks in queue

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Deploy Semantic Segmentation Network (U-Net) with TensorRT (no upsampling support)

Tags:

python

tensorflow

keras

semantic-segmentation

tensorrt

Yayuchen

People also ask

2 Answers

Xenon

Yayuchen

Recent Activity

Donate For Us