Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel speedup with OpenMP

I have two scenarios of measuring metrics like computation time and parallel speedup (sequential_time/parallel_time).

Scenario 1:

Sequential time measurement:

startTime=omp_get_wtime();  
for loop computation  
endTime=omp_get_wtime();  
seq_time = endTime-startTime;

Parallel time measurement:

startTime = omp_get_wtime();  
for loop computation (#pragma omp parallel for reduction (+:pi) private (i)  
for (blah blah) {   
    computation;   
}  
endTime=omp_get_wtime();  
paralleltime = endTime-startTime; 

speedup = seq_time/paralleltime;

Scenario 2:

Sequential time measurement:

for loop{  
startTime=omp_get_wtime();  
   computation;  
endTime=omp_get_wtime();  
seq_time += endTime-startTime;  
}

Parallel time measurement:

for loop computation (#pragma omp parallel for reduction (+:pi, paralleltime) private (i,startTime,endTime)  
for (blah blah) {  
    startTime=omp_get_wtime();  
    computation;  
    endTime=omp_get_wtime();  
    paralleltime = endTime-startTime;  
}  

speedup = seq_time/paralleltime;

I know that Scenario 2 is NOT the best production code, but I think that it measures the actual theoretical performance by OVERLOOKING the overhead involved in openmp spawning and managing (thread context switching) several threads. So it will give us a linear speedup. But Scenario 1 considers the overhead involved in spawning and managing threads.

My doubt is this: With Scenario 1, I am getting a speedup which starts out linear, but tapers off as we move to a higher number of iterations. With Scenario 2, I am getting a full on linear speedup irrespective of the number of iterations. I was told that in reality, Scenario 1 will give me a linear speedup irrespective of the number of iterations. But I think it will not because of the high overload due to thread management. Can someone please explain to me why I am wrong?

Thanks! And sorry about the rather long post.

like image 484
Neo Avatar asked Oct 28 '11 05:10

Neo


People also ask

Is OpenMP parallel or concurrent?

OpenMP will: Allow a programmer to separate a program into serial regions and parallel regions, rather than T concurrently-executing threads.

What is parallel for in OpenMP?

The OpenMP clause: #pragma omp parallel. creates a parallel region with a team of threads , where each thread will execute the entire block of code that the parallel region encloses.

Is MPI slower than OpenMP?

Quite often the question arises as to which is faster or more efficient in terms of reducing the processing time for an algorithm. The short answer to this is that mpi and openMP, when run with their most basic requirements, are equally efficient at reducing the processing time of a simple computational load.


1 Answers

There's many situations where scenario 2 won't give you linear speedup either -- false sharing between threads (or, for that matter, true sharing of shared variables which get modified), memory bandwidth contention, etc. The sub-linear speedup is generally real, not a measurement artifact.

More generally, once you get to the point where you're putting timers inside for loops, you're considering more fine-grained timing information than is really appropriate to measure using timers like this. You might well want to be able to disentangle the thread management overhead from the actual work being done for a variety of reasons, but here you're trying to do that by inserting N extra function calls to omp_get_wtime(), as well as the arithmetic and the reduction operation, all of which will have non-negligable overhead of their own.

If you really want accurate timing of how much time is being spent in the computation; line, you really want to use something like sampling rather than manual instrumentation (we talk a little bit about the distinction here). Using gprof or scalasca or openspeedshop (all free software) or Intel's VTune or something (commercial package) will give you the information about how much time is being spent on that line -- often even by thread -- with much lower overhead.

like image 140
Jonathan Dursi Avatar answered Sep 20 '22 20:09

Jonathan Dursi