Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Displaying CUDA-processed images in WPF

I have a WPF application that acquires images from a camera, processes these images, and displays them. The processing part has become burdensome for the CPU, so I've looked at moving this processing to the GPU and running custom CUDA kernels against them. The basic process is as follows:

1) acquire image from camera 2) load image onto GPU 3) call CUDA kernel to process image 4) display processed image

A WPF-to-CUDA-to-Display Control strategy is what I'm trying to figure out. It seems natural that once the image is loaded onto the GPU that it would not have to be unloaded in order to be displayed. I've read that this can be done with OpenGL, but do I really need to learn OpenGL and include it in my project in order to do a fast display of a CUDA-processed image?

I understand (I think) the issues of calling CUDA kernels from C#. My plan is to either build an unmanaged library around my CUDA calls, which I later wrap for C# -- OR -- try to decide on which one of the managed wrappers (managedCUDA, Cudafy, etc.) to try. I worry about using one of the prebuilt wrappers because they all appear to be lightly supported...but maybe I have the wrong impression.

Anyway, I'm feeling a bit overwhelmed after days of researching the possible options. Any advice would be greatly appreciated.

like image 343
Bryan Greenway Avatar asked Oct 01 '22 09:10

Bryan Greenway


1 Answers

The process of taking a result of CUDA computation and using it directly on the device for a graphics activity is called "interop". There is OpenGL "interop" and there is DirectX "interop". There are plenty of CUDA sample codes demonstrating how to interact with computed images.

To go directly from computed data on the device, to display, without a trip to the host, you will need to use one of these 2 APIs (OpenGL or DirectX).

You mentioned two of the managed interfaces I've heard of, so it seems like you're aware of the options there.

If the processing time is significant compared to (much larger than) the time taken to transfer the image from host to device, you might consider starting out by just transferring the image from host to device, processing it, and then transferring it back, where you can then use the same plumbing you have been using to display it. You can then decide if the additional effort for interop is worth it.

If you can profile your code to figure out how long the image processing takes on the host, and then prototype something on the device to find out how much faster it is, that will be instructive.

You may find that the processing time is so long you can even benefit from the double-copy arrangement. Or you may find the processing time is so short on the host (compared to just the cost to transfer to the device) that the CUDA acceleration would not be useful.

like image 52
Robert Crovella Avatar answered Oct 05 '22 13:10

Robert Crovella