I'm working with someone who has some MATLAB code that they want to be sped up. They are currently trying to convert all of this code into CUDA to get it to run on a CPU. I think it would be faster to use MATLAB's parallel computing toolbox to speed this up, and run it on a cluster that has MATLAB's Distributed Computing Toolbox, allowing me to run this across several different worker nodes. Now, as part of the parallel computing toolbox, you can use things like GPUArray. However, I'm confused as to how this would work. Are using things like parfor (parallelization) and gpuarray (gpu programming) compatible with each other? Can I use both? Can something be split across different worker nodes (parallelization) while also making use of whatever GPUs are available on each worker?
They think its still worth exploring the time it takes to convert all of your matlab code to cuda code to run on a machine with multiple GPUs...but I think the right approach would be to use the features already built into MATLAB.
Any help, advice, direction would be really appreciated!
Thanks!
Use MATLAB Functions with the GPU To transfer it to the GPU and create a gpuArray object, use the gpuArray function. To operate with gpuArray objects, use any gpuArray -enabled MATLAB function. MATLAB automatically runs calculations on the GPU.
Parallel Computing Toolbox enables you to use NVIDIA® GPUs directly from MATLAB using gpuArray . More than 500 MATLAB functions run automatically on NVIDIA GPUs, including fft , element-wise operations, and several linear algebra operations such as lu and mldivide , also known as the backslash operator (\).
The function runs in serial if you do not have Parallel Computing Toolbox. Use a parfor -loop without a pool to run magic with different matrix sizes. The loop runs in serial if you do not have Parallel Computing Toolbox.
If you are mainly interested in simulations, GPU processing is the perfect choice. However, if you want to analyse (big) data, go with Parallization. The reason for this is, that GPU processing is only faster than cpu processing if you don't have to copy data back and forth. In case of a simulation, you can generate most of the data on the GPU and only need to copy the result back. If you try to work with bigger data on the GPU you will very often run into out of memory problems. Parallization is great if you have big data structures and more than 2 cores in your computer CPU.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With