OpenCL FFT on both Nvidia and AMD hardware?

Tags:

I'm working on a project that needs to make use of FFTs on both Nvidia and AMD graphics cards. I initially looked for a library that would work on both (thinking this would be the OpenCL way) but I wasn't having any luck.

Someone suggested to me that I would have to use each vendor's FFT implementation and write a wrapper that chose what to do based on the platform. I found AMD's implementation pretty easily, but I'm actually working with an Nvidia card in the meantime (and this is the more important one for my particular application).

The only Nvidia implementation I can find is the CUFFT one. Does anyone know how I can actually use the CUFFT library from OpenCL? The only way I can think of is by having some CUDA code alongside my OpenCL code. I've read that I can't just use OpenCL buffers as CUDA pointers ( Trying to mix in OpenCL with CUDA in NVIDIA's SDK template ). Instead, would I have to copy the buffers back to the host after running OpenCL kernels and then copy them back to the GPU using the CUDA memory transfer routines? I don't really like this approach as it seems to involve pointless memory transfers, I would much prefer it if I could just use CUFFT from OpenCL.

499

asked Jul 03 '12 04:07

Lorentz

2 Answers

NVIDIA has not done any work to support OpenCL libraries, like FFT. It also has not provided source to its CUDA libraries, so there is no way to run those using OpenCL.

AMD's FFT library is your best bet and will run on any other OpenCL-compliant device, including NVIDIA's GPUs. ArrayFire OpenCL leverages AMD's FFT library, and I've run that on Intel, NVIDIA, and AMD devices in our lab.

170

answered Oct 16 '22 13:10

Ben Stewart

In addition to Ben's AMD suggestion, you could also investigate the Apple FFT example code. However, their code runs only on GPU devices as it checks for which device types the provided command queue was created for.

answered Oct 16 '22 12:10

matthias

Related questions
                            
                                how can i use cuda with nodejs
                            
                                Is "cudaMallocManaged" slower than "cudaMalloc"?
                            
                                CUDA: Is coalesced global memory access faster than shared memory? Also, does allocating a large shared memory array slow down the program?
                            
                                Library function capabilities of Mathematica
                            
                                Using more than one GPU in matlab
                            
                                CUDA Driver API and Function Mangling
                            
                                break overhead vs control flag
                            
                                Is there a custom memory allocator design pattern that does not store metadata in its allocations? [closed]
                            
                                Have different new operators in one C++ program: How to? Bad idea? [duplicate]
                            
                                Is there a CUDA equivalent to std::numeric_limits?
                            
                                Working with many fixed-size matrices in CUDA kernels
                            
                                How to profile OpenCL application with CUDA 8.0 nvprof
                            
                                Conversion to void** on different compilers
                            
                                Matrix Multiplication using CUDA
                            
                                Concurrent GPU kernel execution from multiple processes
                            
                                GPU-based inclusive scan on an unbalanced tree
                            
                                Creating a static CUDA library to be linked with a C++ program
                            
                                Understanding Streaming Multiprocessors (SM) and Streaming Processors (SP)
                            
                                Please explain cudaMemcpyToSymbol example code from CUDA Programming Guide

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

OpenCL FFT on both Nvidia and AMD hardware?

Tags:

cuda

gpgpu

nvidia

opencl

Lorentz

People also ask

2 Answers

Ben Stewart

matthias

Recent Activity

Donate For Us