Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

c++ openmp with shared_pointer

here is a minimal example of what bothers me

#include <iostream>
#include <memory>
#include"omp.h"

class A{
    public:
        A(){std::cout<<this<<std::endl;}
};

int main(){
#pragma omp parallel for 
    for(unsigned int i=0;i<4;i++){
        std::shared_ptr<A> sim(std::make_shared<A>());
    }
    for(unsigned int i=0;i<4;i++){
        std::shared_ptr<A> sim(std::make_shared<A>());
    }
}

If I run that code a few times I may get this kind of result :

0xea3308
0xea32d8
0xea3338
0x7f39f80008c8
0xea3338
0xea3338
0xea3338
0xea3338

What I realized is that the 4 last output have always the same number of characters (8). But for some reason it happens (not always) that one or more of the four first output contains more (14) characters. It looks like the use of openmp changes the "nature" of the pointer (this is my naive understanding). But is this behaviour normal ? Should I expect some strange behaviour ?

EDIT

here is a live test that shows the same problem in a slightly more complicated version of the code

like image 968
PinkFloyd Avatar asked Sep 28 '22 05:09

PinkFloyd


1 Answers

This behaviour is entirely reasonable, let's see what's happening.

Serial loop

In every iteration you're getting one A that's being created on the heap, and one is getting destroyed. These operations are ordered like so:

  1. construction
  2. destruction
  3. construction
  4. destruction
  5. ... (and so on)

Since the As are being created on the heap, they go through the memory allocator. When the memory allocator gets a request for new memory as in step 3, it will (in many cases) first look at recently freed memory. It sees that the last operation was a memory free of exactly the right size (step 2), and therefore will take that chunk of memory again. This procedure will repeat in each iteration. So the serial loop will (commonly but not necessarily) give you the same address over and over again.

Parallel loop

Now let's think about the parallel loop. Since there is no synchronization the ordering of the memory allocations and deallocations is not determined. Therefore it is possible for them to be interleaved in whatever way you can imagine. So the memory allocator will in general not be able to use the same trick as last time to always hand out the same piece of memory. An example ordering may be for example that all four As get constructed before they all get destroyed - something like this:

  1. construction
  2. construction
  3. construction
  4. construction
  5. destruction
  6. destruction
  7. destruction
  8. destruction

The memory allocator will therefore have to serve up 4 brand new pieces of memory before it can get some back and start recycling.

The behaviour of the stack based version is slightly more deterministic, but can depend on compiler optimizations. For the serial version every time the object is created/destroyed the stack pointer is adjusted. Since there is nothing happening in between, it will keep getting created in the same location.

For the parallel version, every thread has it's own stack in a shared memory system. Therefore each thread will create it's objects in a different memory location, and no "recycling" is possible.

The behaviour you're seeing is in no way strange or for that matter guaranteed. It depends on the amount of physical cores you have, how many threads get run, how many iterations you use - generally runtime conditions.

Bottom line: everything is fine, you shouldn't read too much into it.

like image 73
jepio Avatar answered Oct 10 '22 03:10

jepio