May be it is so stupid question.
On this site I read that
The valarray specification allows for libraries to implement it with several efficiency optimizations, such as parallelization of certain operations
What is at the moment with parallelization of std::valarray
on different platforms and compilers? GCC, VS2010/2013, clang?
Especially with standard threading support from C++11
.
UPD: And if some sompilers don't support this feature. What is the best way to do this: apply some function to elements of a container in several threads? Obviously, naive solution would be short and works well with std::thread
but maybe exist better solution?
Intel appears to have done some work on this.
For the other ones: I don't think so. cppreference says that
Some C++ standard library implementations use expression templates to implement efficient operations on std::valarray (e.g. GNU libstdc++ and LLVM libc++). Only rarely are valarrays optimized any further, as in e.g. Intel Parallel Studio.
I also did not find any documentation stating that libc++ or libstdc++ did anything fancy in this regard, and usually no one hides cool features. :)
Considering MSVC: I once encountered code using std::valarray
that compiled but did not link because Microsoft "forgot" to implement some methods. This is of course no proof, but for me, it does not sound like anything cool happened there either. I also could not find any documentation for special features there.
For one, we can use the parallel mode to make libstdc++ parallelize the following algorithms with OpenMP where it deems that useful:
std::accumulate
std::adjacent_difference
std::inner_product
std::partial_sum
std::adjacent_find
std::count
std::count_if
std::equal
std::find
std::find_if
std::find_first_of
std::for_each
std::generate
std::generate_n
std::lexicographical_compare
std::mismatch
std::search
std::search_n
std::transform
std::replace
std::replace_if
std::max_element
std::merge
std::min_element
std::nth_element
std::partial_sort
std::partition
std::random_shuffle
std::set_union
std::set_intersection
std::set_symmetric_difference
std::set_difference
std::sort
std::stable_sort
std::unique_copy
To do so, simply define _GLIBCXX_PARALLEL
during compilation. I feel like this covers a good chunk of stuff one would like to do with arrays of numbers. Of course
Note that the _GLIBCXX_PARALLEL define may change the sizes and behavior of standard class templates such as std::search, and therefore one can only link code compiled with parallel mode and code compiled without parallel mode if no instantiation of a container is passed between the two translation units. Parallel mode functionality has distinct linkage, and cannot be confused with normal mode symbols.
(from here.)
Another tool that can help you to parallelize is the Intel Advisor. This is more advanced and can also handle your loops I believe (never used it myself), but of course this is proprietary software.
For linear algebra operations, you can also look for a good, parallel LAPACK-implementation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With