Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deploy Semantic Segmentation Network (U-Net) with TensorRT (no upsampling support)

I am trying to deploy a trained U-Net with TensorRT. The model was trained using Keras (with Tensorflow as backend). The code is very similar to this one: https://github.com/zhixuhao/unet/blob/master/model.py

When I converted the model to UFF format, using some code like this:

import uff
import os
uff_fname = os.path.join("./models/", "model_" + idx + ".uff")
uff_model = uff.from_tensorflow_frozen_model(
    frozen_file = os.path.join('./models', trt_fname), output_nodes = output_names, 
    output_filename = uff_fname
)

I will get the following warning:

Warning: No conversion function registered for layer: ResizeNearestNeighbor yet.
Converting up_sampling2d_32_12/ResizeNearestNeighbor as custom op: ResizeNearestNeighbor
Warning: No conversion function registered for layer: DataFormatVecPermute yet.
Converting up_sampling2d_32_12/Shape-0-0-VecPermuteNCHWToNHWC-LayoutOptimizer as custom op: DataFormatVecPermute

I tried to avoid this by replacing the upsampling layer with upsampling(bilinear interpolation) and transpose convolution. But the converter would throw me similar errors. I checked https://docs.nvidia.com/deeplearning/sdk/tensorrt-support-matrix/index.html and it seemed all these operations are not supported yet.

I am wondering if there is any workaround to this problem? Is there any other format/framework that TensorRT likes and has upsampling supported? Or is it possible to replace it with some other supported operations?

I also saw somewhere that one can add customized operations to replace those unsupported ones for TensorRT. Though I am not so sure how the workflow would be. It would also be really helpful if someone could point out an example of custom layers.

Thank you in advance!

like image 866
Yayuchen Avatar asked Jul 17 '19 22:07

Yayuchen


People also ask

Which of the following methods can be used for upsampling in U-Net?

There are a few ways of upsampling such as Nearest Neighbor, Bilinear Interpolation, and Transposed Convolution from simplest to more complex.

Can U-Net be used for instance segmentation?

The U-Net architecture can be used for semantic segmentation; The Mask R-CNN architecture can be used for instance segmentation.

Why is U-Net good for segmentation?

The Intuition Behind UNet This works well in classification problems as the image is converted into a vector which used further for classification. But in image segmentation, we not only need to convert feature map into a vector but also reconstruct an image from this vector.


2 Answers

The warnings are because these operations are not supported yet by TensorRT, as you already mentioned. Unfortunately there is no easy way to fix this. You either have to modify the graph (even after training) to use a combination supported operation only; or write these operation yourself as custom layer.

However, there is a better way to run inference on other devices in C++. You can use TensorFlow mixed with TensorRT together. TensorRT will analyze the graph for ops that it supports and convert them to TensorRT nodes, and the remaining of the graph will be handled by TensorFlow as usual. More information here. This solution is much faster than rewriting the operations yourself. The only complicated part is to build TensorFlow from sources on your target device and generating the dynamic library tensorflow_cc. Recently there are many guides and support for TensorFlow ports to various architectures e.g. ARM.

like image 126
Xenon Avatar answered Sep 21 '22 14:09

Xenon


Update 09/28/2019

Nvidia released TensorRT 6.0.1 about two weeks ago and added a new API called "IResizeLayer". This layer supports "Nearest" interpolation and can thus be used to implement upsampling. No need to use custom layers/plugins any more!

Original answer:

thanks for all the answers and suggestions posted here!

In the end, we implemented the network in TensorRT C++ API directly and loaded the weights from the .h5 model file. We haven't got the time to profile and polish the solution yet, but the inference seems to be working according to the test images we fed in.

Here's the workflow we've adopted:

Step 1: Code the upsampling layer.

In our U-Net model, all the upsampling layer has a scaling factor of (2, 2) and they all use ResizeNearestNeighbor interpolation. Essentially, pixel value at (x,y) in the original tensor will go to four pixels: (2x, 2y), (2x+1, 2y), (2x, 2y+1) and (2x+1, 2y+1) in the new tensor. This can be easily coded up into a CUDA kernel function.

Once we got the upsampling kernel we need to wrap it with TensorRT API, specifically the IPluginV2Ext class. The developer reference has some descriptions of what functions need to be implemented. I'd say enqueue() is the most important function because the CUDA kernel gets executed there.

There are also examples in the TensorRT Samples folder. For my version, these resources are helpful:

  • Github: Leaky Relu as custom layer
  • TensorRT-5.1.2.2/samples/sampleUffSSD
  • TensorRT-5.1.2.2/samples/sampleSSD

Step 2: Code the rest of the network using TensorRT API

The rest of the network should be quite straightforward. Just find call different "addxxxLayer" function from TensorRT network definitions.

One thing to keep in mind: depending on which version of TRT you are using, the way to add padding can be different. I think the newest version (5.1.5) allows developers to add parameters in addConvolution() so that the proper padding mode can be selected.

My model was trained using Keras, the default padding mode is that the right and bottom get more padding if the total number of padding is not even. Check this Stack Overflow link for details. There's a mode in 5.1.5 that represents this padding scheme.

If you are on an older version (5.1.2.2), you will need to add the padding as a separate layer before the convolution layer, which has two parameters: pre-padding and post-padding.

Also, all things are NCHW in TensorRT

Helpful sample:

  • TensorRT-5.1.2.2/samples/sampleMNISTAP

Step 3: Load the weights

TensorRT wants weights in format [out_c, in_c, filter_h, filter_w], which is mentioned in an archived documentation. Keras has weights in format [filter_h, filter_w, c_in, c_out].

We got a pure weights file by calling model.save_weights('weight.h5') in Python. Then we can read the weights into a Numpy array using h5py, performed transposing and saved the transposed weights as a new file. We also figured out the Group and Dataset name using h5py. This info was used when loading weights into C++ code using HDF5 C++ API.

We compared the output layer by layer between C++ code and Python code. For our U-Net, all the activation maps are the same till maybe the third block (after 2 pooling). After that, there is a tiny difference between pixel values. The absolute percentage error is 10^-8 so we don't think it's that bad. We are still in the process of polishing the C++ implementation.

Again, thanks for all the suggestions and answers we got in this post. Hope our solution can be helpful as well!

like image 27
Yayuchen Avatar answered Sep 21 '22 14:09

Yayuchen