Can gcc make my code parallel?

Tags:

I was wondering if there is an optimization in gcc that can make some single-threaded code like the example below execute in parallel. If no, why? If yes, what kind of optimizations are possible?

Click to copy

#include <iostream>

int main(int argc, char *argv[])
{
    int array[10];
    for(int i = 0; i < 10; ++ i){
        array[i] = 0;
    }
    for(int i = 0; i < 10; ++ i){
        array[i] += 2;
    }
    return 0;
}

Added:

Thanks for OpenMP links, and as much as I think it's useful, my question is related to compiling same code without the need to rewrite smth. So basically I want to know if:

Making code parallel(at least in some cases) without rewriting it is possible?
If yes, what cases can be handled? If not, why?

442

asked Oct 17 '16 13:10

Liberus

2 Answers

The compiler can try to automatically parallelise your code, but it wont do it by creating threads. It may use vectorised instructions (intel intrinsics for an intel CPU, for example) to operate on multiple elements at a time, where it can detect that using those instructions is possible (for example when you perform the same operation multiple times on consecutive elements of a correctly aligned data structure). You can help the compiler by telling it which intrinsic instruction set your CPU supports (-mavx, -msse4.2 ... for example).

You can also use these instructions directly, but it requires a non-trivial amount of work for the programmer. There are also libraries which do this already (see the vector class here Agner Fog's blog).

You can get the compiler to auto-parallelise using multiple threads by using OpenMP (OpenMP introducion), which is more instructing the compiler to auto-parallelise, than the compiler auto-parallelising by itself.

190

answered Oct 03 '22 22:10

RobClucas

Yes, gcc with -ftree-parallelize-loops=4 will attempt to auto-parallelize with 4 threads, for example.

I don't know how well gcc does at auto-parallelization, but it is something that compiler developers have been working on for years. As other answers point out, giving the compiler some guidance with OpenMP pragmas can give better results. (e.g. by letting the compiler know that it doesn't matter what order something happens in, even when that may slightly change the result, which is common for floating point. Floating point math is not associative.)

And also, only doing auto-parallelization for #pragma omp loops means only the really important loops get this treatment. -ftree-parallelize-loops probably benefits from PGO (profile-guided optimization) to know which loops are actually hot and worth parallelizing and/or vectorizing.

It's somewhat related to finding the kind of parallelism that SIMD can take advantage of, for auto-vectorizing loops. (Which is enabled by default at -O3 in gcc, and at -O2 in clang).

answered Oct 04 '22 00:10

Peter Cordes

Related questions
                            
                                How to append text in a QTextBrowser in QT?
                            
                                dynamically allocating 3d array
                            
                                Reference initialization in C++11 default constructor
                            
                                Reducing the complexity of an o(n^3) c++ code
                            
                                Fastest way to create large file in c++?
                            
                                Stylistic question concerning returning void
                            
                                Templated class function T: How to find out if T is a pointer?
                            
                                C++ Error: No Match for Call
                            
                                Atomic delete for large amounts of files
                            
                                Simulating low memory using C++
                            
                                How can I call a method given only its name?
                            
                                What to do if a failed destructor can't throw an exception
                            
                                How to do inline assembly in C++ (Visual Studio 2010)
                            
                                how to use stl::map as two dimension array
                            
                                Why can't compiler optimize these 2 statements out?
                            
                                Why do I say int *p = NULL in the declaration, but p != NULL in the test, why not *p != NULL to match the declaration?
                            
                                String literals that contain '\0' - why aren't they the same?
                            
                                Why polymorphism is not working in this case?
                            
                                Do I get a performance issue, if i do not care about a return value
                            
                                Making the code cleaner [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With