I was using a naive prime generated function. This code takes about, 5.25 seconds to generate 10k prime numbers (device_primes[0] holds the number primes already found, the remaining position the prime numbers found).
_global__ void getPrimes(int *device_primes,int n)
{
int c = 0;
int thread_id = blockIdx.x * blockDim.x + threadIdx.x;
int num = thread_id+2;
if (thread_id == 0) device_primes[0] = 1;
__syncthreads();
while(device_primes[0] < n)
{
for (c = 2; c <= num - 1; c++)
{
if (num % c == 0) //not prime
{
break;
}
}
if (c == num) //prime
{
int pos = atomicAdd(&device_primes[0],1);
device_primes[pos] = num;
}
num += blockDim.x * gridDim.x; // Next number for this thread
}
}
I was just starting to optimize the code, and i made the follow modification, instead of :
for (c = 2; c <= num - 1; c++)
{
if (num % c == 0) //not prime
break;
}
if (c == num) {...}
i have now :
int prime = 1;
...
for (c = 2; c <= num - 1 && prime; c++)
{
if (num % c == 0) prime = 0; // not prime
}
if (prime) {...} // if prime
Now i can generate 10k in 0.707s. I was just wondering why such speed up with a this simple modification, is break that bad?
As Tony suggested, divergent code execution can cause major slow downs in gpu code, forcing some code to run in serial rather than parallel. In the slow version of the code above, threads that hit the break diverge from code that continues.
The cuda c programming guide is a good resource for gpu programming techniques. Here is what it says about control flow:
Any flow control instruction (if, switch, do, for, while) can significantly impact the effective instruction throughput by causing threads of the same warp to diverge (i.e., to follow different execution paths). If this happens, the different executions paths have to be serialized, increasing the total number of instructions executed for this warp. When all the different execution paths have completed, the threads converge back to the same execution path.
Newer nvidia hardware and cuda versions can handle some branching a little better than older versions, but it's still best to avoid branching whenever possible.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With