Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can gcc make my code parallel?

I was wondering if there is an optimization in gcc that can make some single-threaded code like the example below execute in parallel. If no, why? If yes, what kind of optimizations are possible?

#include <iostream>

int main(int argc, char *argv[])
{
    int array[10];
    for(int i = 0; i < 10; ++ i){
        array[i] = 0;
    }
    for(int i = 0; i < 10; ++ i){
        array[i] += 2;
    }
    return 0;
}

Added:

Thanks for OpenMP links, and as much as I think it's useful, my question is related to compiling same code without the need to rewrite smth. So basically I want to know if:

  1. Making code parallel(at least in some cases) without rewriting it is possible?
  2. If yes, what cases can be handled? If not, why?
like image 442
Liberus Avatar asked Oct 17 '16 13:10

Liberus


People also ask

How do you code parallel?

The general way to parallelize any operation is to take a particular function that should be run multiple times and make it run parallelly in different processors. To do this, you initialize a Pool with n number of processors and pass the function you want to parallelize to one of Pool s parallization methods.

Does C++ support parallel programming?

Parallel Programming in Visual C++ Visual C++ provides the following technologies to help you create multi-threaded and parallel programs that take advantage of multiple cores and use the GPU for general purpose programming.

Can you compile C++ code with GCC?

GNU Compiler Collection (GCC): a compiler suite that supports many languages, such as C/C++ and Objective-C/C++. GNU Make: an automation tool for compiling and building applications. GNU Binutils: a suite of binary utility tools, including linker and assembler. GNU Debugger (GDB).


2 Answers

The compiler can try to automatically parallelise your code, but it wont do it by creating threads. It may use vectorised instructions (intel intrinsics for an intel CPU, for example) to operate on multiple elements at a time, where it can detect that using those instructions is possible (for example when you perform the same operation multiple times on consecutive elements of a correctly aligned data structure). You can help the compiler by telling it which intrinsic instruction set your CPU supports (-mavx, -msse4.2 ... for example).

You can also use these instructions directly, but it requires a non-trivial amount of work for the programmer. There are also libraries which do this already (see the vector class here Agner Fog's blog).

You can get the compiler to auto-parallelise using multiple threads by using OpenMP (OpenMP introducion), which is more instructing the compiler to auto-parallelise, than the compiler auto-parallelising by itself.

like image 190
RobClucas Avatar answered Oct 03 '22 22:10

RobClucas


Yes, gcc with -ftree-parallelize-loops=4 will attempt to auto-parallelize with 4 threads, for example.

I don't know how well gcc does at auto-parallelization, but it is something that compiler developers have been working on for years. As other answers point out, giving the compiler some guidance with OpenMP pragmas can give better results. (e.g. by letting the compiler know that it doesn't matter what order something happens in, even when that may slightly change the result, which is common for floating point. Floating point math is not associative.)

And also, only doing auto-parallelization for #pragma omp loops means only the really important loops get this treatment. -ftree-parallelize-loops probably benefits from PGO (profile-guided optimization) to know which loops are actually hot and worth parallelizing and/or vectorizing.

It's somewhat related to finding the kind of parallelism that SIMD can take advantage of, for auto-vectorizing loops. (Which is enabled by default at -O3 in gcc, and at -O2 in clang).

like image 32
Peter Cordes Avatar answered Oct 04 '22 00:10

Peter Cordes