Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

std::valarray and parallelization

May be it is so stupid question.

On this site I read that

The valarray specification allows for libraries to implement it with several efficiency optimizations, such as parallelization of certain operations

What is at the moment with parallelization of std::valarray on different platforms and compilers? GCC, VS2010/2013, clang?

Especially with standard threading support from C++11.

UPD: And if some sompilers don't support this feature. What is the best way to do this: apply some function to elements of a container in several threads? Obviously, naive solution would be short and works well with std::thread but maybe exist better solution?

like image 621
Aleksey Lobanov Avatar asked May 03 '15 20:05

Aleksey Lobanov


1 Answers

Intel appears to have done some work on this.

For the other ones: I don't think so. cppreference says that

Some C++ standard library implementations use expression templates to implement efficient operations on std::valarray (e.g. GNU libstdc++ and LLVM libc++). Only rarely are valarrays optimized any further, as in e.g. Intel Parallel Studio.

I also did not find any documentation stating that libc++ or libstdc++ did anything fancy in this regard, and usually no one hides cool features. :)

Considering MSVC: I once encountered code using std::valarray that compiled but did not link because Microsoft "forgot" to implement some methods. This is of course no proof, but for me, it does not sound like anything cool happened there either. I also could not find any documentation for special features there.

So what can we do instead?

For one, we can use the parallel mode to make libstdc++ parallelize the following algorithms with OpenMP where it deems that useful:

std::accumulate    
std::adjacent_difference    
std::inner_product    
std::partial_sum    
std::adjacent_find    
std::count    
std::count_if    
std::equal    
std::find    
std::find_if    
std::find_first_of    
std::for_each    
std::generate    
std::generate_n    
std::lexicographical_compare    
std::mismatch    
std::search    
std::search_n    
std::transform    
std::replace    
std::replace_if    
std::max_element    
std::merge    
std::min_element    
std::nth_element    
std::partial_sort    
std::partition    
std::random_shuffle    
std::set_union    
std::set_intersection    
std::set_symmetric_difference    
std::set_difference    
std::sort    
std::stable_sort    
std::unique_copy

To do so, simply define _GLIBCXX_PARALLEL during compilation. I feel like this covers a good chunk of stuff one would like to do with arrays of numbers. Of course

Note that the _GLIBCXX_PARALLEL define may change the sizes and behavior of standard class templates such as std::search, and therefore one can only link code compiled with parallel mode and code compiled without parallel mode if no instantiation of a container is passed between the two translation units. Parallel mode functionality has distinct linkage, and cannot be confused with normal mode symbols.

(from here.)

Another tool that can help you to parallelize is the Intel Advisor. This is more advanced and can also handle your loops I believe (never used it myself), but of course this is proprietary software.

For linear algebra operations, you can also look for a good, parallel LAPACK-implementation.

like image 94
Baum mit Augen Avatar answered Nov 16 '22 23:11

Baum mit Augen