I am guessing that even reading from shared data in openmp causes some parallel overheads, as depending on processor architecture (if different cores have their own cache...) it may be necessary to refresh the cache to ensure that no other cpu has modified the data before reading.
Am I right in thinking this?
If so, is there a way to tell openmp (on the intel compiler fwiw) that some of the shared data is constant, so such cache refreshing isn't necessary?
If the answer is c++ const
is there an easy way to turn non-const data into const data, without actually reallocating the memory, once the program has passed a certain point at runtime?
UPDATE
Ah, ok. I now remember where I got my impression that const
was a good thing in this context: http://www.akkadia.org/drepper/cpumemory.pdf , section 6.4.1. It's to do with false sharing, where readonly variables which share cache lines with readwrite variables incur the penalty of the cache line being marked exclusive by the readwrite variable. The linked document recommends, for example with gcc, to mark those vars as __attribute__((section(something.else)))
to ensure they get stored elsewhere.
As it happens this is not relevant in my own situation - large arrays and stl containers of data in which the read/write granularity will span many cache lines and which are allocated from different memory pools in any case. So these will naturally be located on different cache lines. No problem!
const
does not imply that the memory is constant. It implies that your handle to that memory cannot write to it (which is subtly different):
int i = 3;
int const& j = i;
i = 4;
std::cout << j << "\n";
Will print 4
, even though when j
was bounded to i
the value was 3
.
Therefore, const
can only tell that you should not modify the underlying data (and the compiler will enforce it, to a point). It does not tell anything about the data itself, except when directly applied to the data:
char const array[] = "Some value";
Here the storage is const
, the value is immutable, and the compiler is free to place this in ROM.
I was also looking at this topic and found a general performance recommendation from Oracle:
If a SHARED variable in a parallel region is read by the threads executing the region, but not written to by any of the threads, then specify that variable to be FIRSTPRIVATE instead of SHARED. This avoids accessing the variable by dereferencing a pointer, and avoids cache conflicts.
Oracle OpenMP API User's Guide - Chapter 7 - Performance Considerations
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With