OpenMP For - group loops for cache optimization

Question

I working to adapt a program to use OpenMP. I have a group of nested for loops. The outermost for loop is a y-axis loops that goes down an image. I would like to run multiple parallel threads on the loop, but I'm having trouble making it fast.

Currently when I run 8 threads it runs like:

thread 0 -> row 0,8,16...
thread 1 -> row 1,9,17...
thread 2 -> row 2,10,18...
thread 3 -> row 3,11,19...

I would like it to run in blocks, so that thread 0 does the first 1/8 of the rows. What is the best way to do this?

Current code:

...
int y_percent = data_size_Y/8;
int thread = 0;

#pragma omp parallel for num_threads(8) firstprivate(vecs, bufferedOut,data_size_X, data_size_Y, kern_cent_X, kern_cent_Y, sum)
for(int y = y_percent*omp_get_thread_num(); y < (omp_get_thread_num()+1)*y_percent; y++){ // the y coordinate of theoutput location we're focusing on

codehathi · Accepted Answer

You can use the schedule clause on the pragma statement to specify the chunk size that you are wanting each thread to process. In the example below, I specify the static scheduling method with a chunk size that specifies the number of contiguous iterations each thread should get. In this simple example, each thread will get chunks of 8 iterations each (e.g. thread 0 will get iterations 0-7, thread 1 iterations 8-15, etc). It is worth pointing out that if you aren't concerned with the ordering of chunk distribution (e.g. if you don't care if thread 0 gets the first chunk or not), you can replace static with dynamic. dynamic gives the ability to assign chunks to threads as they need them instead of preassigning chunks to threads from the start (useful for load balancing when some iterations take longer than others). For more information on the scheduling methods, check out the following:

Wikipedia article - Scheduling Clauses
LLNL docs - DO/for Directive

Example:

#include <stdlib.h>
#include <stdio.h>
#include <omp.h>

int main() {
  int i;
  int iterations = 32;
  int num_threads = 4;

#pragma omp parallel for schedule(static, 8) num_threads(num_threads)
  for(i=0; i<iterations; i++) {
    printf("thread %d: %d\n", omp_get_thread_num(), i);
  }

}

Wikipedia article - Scheduling Clauses
LLNL docs - DO/for Directive

Example:

#include <stdlib.h>
#include <stdio.h>
#include <omp.h>

int main() {
  int i;
  int iterations = 32;
  int num_threads = 4;

#pragma omp parallel for schedule(static, 8) num_threads(num_threads)
  for(i=0; i<iterations; i++) {
    printf("thread %d: %d
", omp_get_thread_num(), i);
  }

}

kangshiyin · Answer

You could simply use the following code to achieve that.

#pragma omp parallel for num_threads(8)
for(int y = 0; y < data_size_Y; y++) {
    ....
}

Generally I think the long list of firstprivate is not necessary. Depending on how you exactly use those variables, most of them should be able to be defined as shared.

OpenMP For - group loops for cache optimization

Tags:

c

multithreading

openmp

Marcus

2 Answers

codehathi

kangshiyin

Recent Activity

Donate For Us

OpenMP For - group loops for cache optimization

Tags:

c

multithreading

openmp

Marcus

2 Answers

codehathi

kangshiyin

Related questions

Recent Activity

Donate For Us