Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Easiest way to use GPU for parallel for loop

I currently have a parallel for loop similar to this:

int testValues[16]={5,2,2,10,4,4,2,100,5,2,4,3,29,4,1,52};
parallel_for (1, 100, 1, [&](int i){ 
    int var4;
    int values[16]={-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1};
    /* ...nested for loops */
        for (var4=0; var4<16; var4++) {
            if (values[var4] != testValues[var4]) break;
        }
    /* ...end nested loops */
}

I have optimised as much as I can to the point that the only thing more I can do is add more resources.

I am interested in utilising the GPU to help process the task in parallel. I have read that embarassingly parallel tasks like this can make use of a modern GPU quite effectively.

Using any language, what is the easiest way to use the GPU for a simple parallel for loop like this?

I know nothing about GPU architectures or native GPU code.

like image 442
Flash Avatar asked Apr 10 '12 01:04

Flash


2 Answers

as Li-aung Yip said in comments, the simplest way to use a GPU is with something like Matlab that supports array operations and automatically (more or less) moves those to the GPU. but for that to work you need to rewrite your code as pure matrix-based operations.

otherwise, most GPU use still requires coding in CUDA or OpenCL (you would need to use OpenCL with an AMD card). even if you use a wrapper for your favourite language, the actual code that runs on the GPU is still usually written in OpenCL (which looks vaguely like C). and so this requires a fair amount of learning/effort. you can start by downloading OpenCL from AMD and reading through the docs...

both those options require learning new ideas, i suspect. what you really want, i think, is a high level, but still traditional-looking, language targeted at the gpu. unfortunately, they don't seem to exist much, yet. the only example i can think of is theano - you might try that. even there, you still need to learn python/numpy, and i am not sure how solid the theano implementation is, but it may be the least painful way forwards (in that it allows a "traditional" approach - using matrices is in many ways easier, but some people seem to find that very hard to grasp, conceptually).

ps it's not clear to me that a gpu will help your problem, btw.

like image 156
andrew cooke Avatar answered Sep 27 '22 20:09

andrew cooke


You might want to check out array fire.

http://www.accelereyes.com/products/arrayfire

If you use openCL, you need to download separate implementations for different device vendors, intel, AMD, and Nvidia.

like image 39
MVTC Avatar answered Sep 27 '22 21:09

MVTC