Parallel speedup with OpenMP

Tags:

I have two scenarios of measuring metrics like computation time and parallel speedup (sequential_time/parallel_time).

Scenario 1:

Sequential time measurement:

startTime=omp_get_wtime();  
for loop computation  
endTime=omp_get_wtime();  
seq_time = endTime-startTime;

Parallel time measurement:

startTime = omp_get_wtime();  
for loop computation (#pragma omp parallel for reduction (+:pi) private (i)  
for (blah blah) {   
    computation;   
}  
endTime=omp_get_wtime();  
paralleltime = endTime-startTime; 

speedup = seq_time/paralleltime;

Scenario 2:

Sequential time measurement:

for loop{  
startTime=omp_get_wtime();  
   computation;  
endTime=omp_get_wtime();  
seq_time += endTime-startTime;  
}

Parallel time measurement:

for loop computation (#pragma omp parallel for reduction (+:pi, paralleltime) private (i,startTime,endTime)  
for (blah blah) {  
    startTime=omp_get_wtime();  
    computation;  
    endTime=omp_get_wtime();  
    paralleltime = endTime-startTime;  
}  

speedup = seq_time/paralleltime;

I know that Scenario 2 is NOT the best production code, but I think that it measures the actual theoretical performance by OVERLOOKING the overhead involved in openmp spawning and managing (thread context switching) several threads. So it will give us a linear speedup. But Scenario 1 considers the overhead involved in spawning and managing threads.

My doubt is this: With Scenario 1, I am getting a speedup which starts out linear, but tapers off as we move to a higher number of iterations. With Scenario 2, I am getting a full on linear speedup irrespective of the number of iterations. I was told that in reality, Scenario 1 will give me a linear speedup irrespective of the number of iterations. But I think it will not because of the high overload due to thread management. Can someone please explain to me why I am wrong?

Thanks! And sorry about the rather long post.

484

asked Oct 28 '11 05:10

Neo

1 Answers

There's many situations where scenario 2 won't give you linear speedup either -- false sharing between threads (or, for that matter, true sharing of shared variables which get modified), memory bandwidth contention, etc. The sub-linear speedup is generally real, not a measurement artifact.

More generally, once you get to the point where you're putting timers inside for loops, you're considering more fine-grained timing information than is really appropriate to measure using timers like this. You might well want to be able to disentangle the thread management overhead from the actual work being done for a variety of reasons, but here you're trying to do that by inserting N extra function calls to omp_get_wtime(), as well as the arithmetic and the reduction operation, all of which will have non-negligable overhead of their own.

If you really want accurate timing of how much time is being spent in the computation; line, you really want to use something like sampling rather than manual instrumentation (we talk a little bit about the distinction here). Using gprof or scalasca or openspeedshop (all free software) or Intel's VTune or something (commercial package) will give you the information about how much time is being spent on that line -- often even by thread -- with much lower overhead.

140

answered Sep 20 '22 20:09

Jonathan Dursi

Related questions
                            
                                Outer join performance
                            
                                MySQL Updates are taking forever
                            
                                Is a C++ program really slower than a similar C program? [closed]
                            
                                signal processing: C++ vs C#
                            
                                How to do a fast DELETE of lots of data from a large table (sql server)
                            
                                compare jquery selectors performance
                            
                                Looking for good bonus quiz to test efficiency (specifically efficiency related to time) [closed]
                            
                                Profiling view in Rails
                            
                                average case running time of linear search algorithm
                            
                                Performance of running Parallel.Foreach on several threads
                            
                                Do the accessors affect the performance of an application?
                            
                                Finding largest and second-largest out of N numbers
                            
                                Is it possible to calculate the execution time of a method in C# with attributes?
                            
                                disassemble c# code to machine instructions
                            
                                Is concatenating an array to itself faster than looping through the array to create more indexes?
                            
                                How can I measure the performance and TCP RTT of my server code?
                            
                                Cast to array VS is_array()
                            
                                The efficiency of using a pthread_rwlock when there are a lot of readers
                            
                                Override python logging for test efficiency
                            
                                Do Foreign Key constraints get checked on an SQL update statement that doesn't update the columns with the Constraint?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Parallel speedup with OpenMP

Tags:

performance

parallel-processing

openmp

Neo

People also ask

1 Answers

Jonathan Dursi

Recent Activity

Donate For Us