Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OpenMP and cores/threads

My CPU is a Core i3 330M with 2 cores and 4 threads. When I execute the command cat /proc/cpuinfo in my terminal, it is like I have 4 CPUS. When I use the OpenMP function get_omp_num_procs() I also get 4.

Now I have a standard C++ vector class, I mean a fixed-size double array class that does not use expression templates. I have carefully parallelized all the methods of my class and I get the "expected" speedup.

The question is: can I guess the expected speedup in such a simple case? For instance, if I add two vectors without parallelized for-loops I get some time (using the shell time command). Now if I use OpenMP, should I get a time divided by 2 or 4, according to the number of cores/threads? I emphasize that I am only asking for this particular simple problem, where there is no interdependence in the data and everything is linear (vector addition).

Here is some code:

Vector Vector::operator+(const Vector& rhs) const
{
    assert(m_size == rhs.m_size);
    Vector result(m_size);
    #pragma omp parallel for schedule(static)
    for (unsigned int i = 0; i < m_size; i++) 
            result.m_data[i] = m_data[i]+rhs.m_data[i];

    return result;
}

I have already read this post: OpenMP thread mapping to physical cores.

I hope that somebody will tell me more about how OpenMP get the work done in this simple case. I should say that I am a beginner in parallel computing.

Thanks!

like image 983
Benjamin Avatar asked Feb 15 '12 11:02

Benjamin


People also ask

Does OpenMP use multiple cores?

We will use a standard system for parallel programming called OpenMP, which enables a C or C++ programmer to take advantage of multi-core parallelism primarily through preprocessor pragmas.

Is OpenMP multithreaded?

The OpenMP standard was formulated in 1997 as an API for writing portable, multithreaded applications. It started as a Fortran-based standard, but later grew to include C and C++. The current version is OpenMP 2.0, and Visual C++® 2005 supports the full standard.

How many threads does OpenMP use?

The obvious drawback of the baseline implementation that we have is that it only uses one thread, and hence only one CPU core. To exploit all CPU cores, we must somehow create multiple threads of execution.

Does OpenMP use threads or processes?

When run, an OpenMP program will use one thread (in the sequential sections), and several threads (in the parallel sections). There is one thread that runs from the beginning to the end, and it's called the master thread.


1 Answers

EDIT : Now that some code has been added.

In that particular example, there is very little computation and lots of memory access. So the performance will depend heavily on:

  • The size of the vector.
  • How you are timing it. (do you have an outer-loop for timing purposes)
  • Whether the data is already in cache.

For larger vector sizes, you will likely find that the performance is limited by your memory bandwidth. In which case, parallelism is not going to help much. For smaller sizes, the overhead of threading will dominate. If you're getting the "expected" speedup, you're probably somewhere in between where the result is optimal.

I refuse to give hard numbers because in general, "guessing" performance, especially in multi-threaded applications is a lost cause unless you have prior testing knowledge or intimate knowledge of both the program and the system that it's running on.

Just as a simple example taken from my answer here: How to get 100% CPU usage from a C program

On a Core i7 920 @ 3.5 GHz (4 cores, 8 threads):

If I run with 4 threads, the result is:

This machine calculated all 78498 prime numbers under 1000000 in 39.3498 seconds

If I run with 4 threads and explicitly (using Task Manager) pin the threads on 4 distinct physical cores, the result is:

This machine calculated all 78498 prime numbers under 1000000 in 30.4429 seconds

So this shows how unpredictable it is for even a very simple and embarrassingly parallel application. Applications involving heavy memory usage and synchronization get a lot uglier...

like image 179
Mysticial Avatar answered Sep 30 '22 02:09

Mysticial