My goal is to run a TensorFlow model in real time to control a vehicle from a learned model. Our vehicle system uses ROS (Robot Operating System) which is tied closely to OpenCV. So, I receive an OpenCV Mat containing the image of interest from ROS.
cv::Mat cameraImg;
I would like to create a Tensorflow Tensor directly from the data in this OpenCV matrix to avoid the expense of copying the matrix line-by-line. Using the answer to This Question I have managed to get the forward pass of the network working with the following code:
cameraImg.convertTo(cameraImg, CV_32FC3);
Tensor inputImg(DT_FLOAT, TensorShape({1,inputheight,inputwidth,3}));
auto inputImageMapped = inputImg.tensor<float, 4>();
auto start = std::chrono::system_clock::now();
//Copy all the data over
for (int y = 0; y < inputheight; ++y) {
const float* source_row = ((float*)cameraImg.data) + (y * inputwidth * 3);
for (int x = 0; x < inputwidth; ++x) {
const float* source_pixel = source_row + (x * 3);
inputImageMapped(0, y, x, 0) = source_pixel[2];
inputImageMapped(0, y, x, 1) = source_pixel[1];
inputImageMapped(0, y, x, 2) = source_pixel[0];
}
}
auto end = std::chrono::system_clock::now();
However, using this method the copy to the tensor takes between 80ms and 130ms, while the entire forward pass (for a 10-layer convolutional network) only takes 25ms.
Looking at the tensorflow documentation, it appears there is a Tensor constructor that takes an allocator. However, I have not been able to find any Tensorflow or Eigen documentation relating to this functionality or the Eigen Map class as it relates to Tensors.
Does anyone have any insight into how this code can be sped up, ideally by re-using my OpenCV memory?
EDIT: I have successfully implemented what @mrry suggested, and can re-use the memory allocated by OpenCV. I have opened github issue 8033 requesting this be added to the tensorflow source tree. My method isn't that pretty, but it works.
It is still very difficult to compile an external library and link it to the libtensorflow.so library. Potentially the tensorflow cmake library will help with this, I have not yet tried it.
I know that it is old thread but there is a zero copy solution to your problem using the existing C++ API: I updated your github issue with my solution. tensorflow/issues/8033
For the record I copy my solution here:
// allocate a Tensor
Tensor inputImg(DT_FLOAT, TensorShape({1,inputHeight,inputWidth,3}));
// get pointer to memory for that Tensor
float *p = inputImg.flat<float>().data();
// create a "fake" cv::Mat from it
cv::Mat cameraImg(inputHeight, inputWidth, CV_32FC3, p);
// use it here as a destination
cv::Mat imagePixels = ...; // get data from your video pipeline
imagePixels.convertTo(cameraImg, CV_32FC3);
Hope this helps
The TensorFlow C API (as opposed to the C++ API) exports the TF_NewTensor()
function, which allows you to create a tensor from a pointer and a length, and you can pass the resulting object to the TF_Run()
function.
Currently this is the only public API for creating a TensorFlow tensor from a pre-allocated buffer. There is no supported way to cast a TF_Tensor*
to a tensorflow::Tensor
but if you look at the implementation there is a private API with friend
access that can do this. If you experiment with this, and can show an appreciable speedup, we'd consider a feature request for adding this to the public API.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With