I'm running a Mask R-CNN model on an edge device (with an NVIDIA GTX 1080). I am currently using the Detectron2 Mask R-CNN implementation and I archieve an inference speed of around 5 FPS.
To speed this up I looked at other inference engines and model implementations. For example ONNX, but I'm not able to gain a faster inference speed.
TensorRT looks very promising to me but I did not found a ready "out-of-the-box" implementation for it.
Are there any other mature and fast inference engines or other techniques to speed up the inference?
Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework.
YOLO's performance was slightly better than Mask R-CNN, shown by 98.96% and 96.73% precision, and 80.93% and 75.43% recall, respectively. The experimental result also revealed that YOLO outperforms Mask R-CNN with mAP of 80.12% and 73.39%, respectively.
The Convolutional Neural Network Architecture consists of three main layers: Convolutional layer : This layer helps to abstract the input image as a feature map via the use of filters and kernels.
Mask RCNN is a deep neural network aimed to solve instance segmentation problem in machine learning or computer vision. In other words, it can separate different objects in a image or a video. You give it a image, it gives you the object bounding boxes, classes and masks.
It's almost impossible to get higher inference speed for Mask R-CNN on GTX 1080. You may check detectron2
by Facebook AI Research.
Otherwise, I'd suggest to use YOLACT
- (You Only Look At CoefficienTs), it can achieve real-time instance segmentation.
On the other hand, if you don't need instance segmentation, you can use YOLO, SSD, etc for object detection.
OpenCV 4.5.0 with DNN_BACKEND_CUDA
and DNN_TARGET_CUDA
/DNN_TARGET_CUDA_FP16
.
Mask RCNN with 1024 x 1024 input image
Device | FPS
------------------ | -------
GTX 1080 Ti (FP32) | 29
RTX 2080 Ti (FP16) | 60
FPS measured includes NMS but excludes other preprocessing and postprocessing. The network fully runs end-to-end on GPU.
Benchmark code: https://gist.github.com/YashasSamaga/48bdb167303e10f4d07b754888ddbdcf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With