Having had a look at previous questions 1, 2 , I was wondering if I can force the compiler to perform a constant folding for the following code which prints prime numbers. <pre class="prettyprint"><code>#include <iostream> using namespace std; inline bool is_prime(int n) { if(n<2) return false; for(int i=2;i*i<=n;i++) if(n%i==0) return false; return true; } int main() { for(int i=0;i<20;i++) if(is_prime(i)) cout<<i<<endl; return 0; } </code></pre> And I build it via: <pre class="prettyprint"><code>g++ -O3 -S main.cpp -o main.asm </code></pre> As the result are a few: <pre class="prettyprint"><code>2,3,5,7,11,13,17,19 </code></pre> I would like to force compiler look at the code similar to <pre class="prettyprint"><code>for(int x:{2,3,5,7,11,13,17,19}) cout<<x<<endl; </code></pre> or <pre class="prettyprint"><code>cout<< 2 <<endl; cout<< 3 <<endl; cout<< 5 <<endl; cout<< 7 <<endl; cout<< 11 <<endl; cout<< 13 <<endl; cout<< 17 <<endl; cout<< 19 <<endl; </code></pre> But reading the assembly shows that none happens. I even used <code>__builtin_expect</code> but it didn't work. Is there any way to force the compiler optimizer to read the for loop and use the advantage that the output data is known? I would like to do it without using template meta-programming. PS. My real aim is just testing the compiler rather than an efficient method to calculate prime numbers. I just want to show off to my friends how much C++ compiler is powerful. <hr> If separation of <code>is_prime</code> is matter of concern, I put everything inside the main and no difference was observed: <pre class="prettyprint"><code>#include <iostream> using namespace std; int main() { for(int n=2;n<20;n++) { bool is_prime=true; for(int i=2;i*i<=n;i++) if(n%i==0) { is_prime=false; break; } if(is_prime) cout<<n<<endl; } return 0; } </code></pre> <hr> There is even a further example that remains no excuse for the compiler: <pre class="prettyprint"><code>#include <iostream> #include <vector> using namespace std; int prime_after6000() { int n=6000; do { bool is_prime=true; for(int i=2;i*i<=n;i++) if(n%i==0) { is_prime=false; break; } if(is_prime) return n; n++; }while(true); } int main() { cout<<prime_after6000()<<endl; return 0; } </code></pre> assembly: <pre class="prettyprint"><code>... main: .LFB1907: .cfi_startproc subq $8, %rsp .cfi_def_cfa_offset 16 movl $6000, %esi ;;;;;;;;;;;;;;;;;;;; bad .L18: testb $1, %sil je .L15 movl $2, %ecx jmp .L16 .p2align 4,,10 .p2align 3 .L17: movl %esi, %eax cltd idivl %ecx testl %edx, %edx je .L15 .L16: addl $1, %ecx movl %ecx, %eax imull %ecx, %eax cmpl %esi, %eax jle .L17 movl $_ZSt4cout, %edi call _ZNSolsEi movq %rax, %rdi call _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_ xorl %eax, %eax addq $8, %rsp .cfi_remember_state .cfi_def_cfa_offset 8 ret .p2align 4,,10 .p2align 3 .L15: .cfi_restore_state addl $1, %esi jmp .L18 .cfi_endproc .LFE1907: .size main, .-main .p2align 4,,15 .type _GLOBAL__sub_I__Z15prime_after6000v, @function _GLOBAL__sub_I__Z15prime_after6000v: ... </code></pre>

There's a fundamental misunderstanding of compilers here. Let's examine the program you've written very carefully and think about what you're expecting from the compiler to do for you. The main characteristic of the program is that it doesn't take any input, but it emits output by writing to <code>cout</code>. Keep in mind that the <code>is_prime</code> function is not a compiler intrinsic; the compiler treats it as just another function. This is important and I'll come back to it later. Now how would the compiler transform the program the way you described? How can it do something like that? That is, how can the compiler transform those two nested loops to a simple sequence of statements that write integers to <code>cout</code>? The only way it can possibly do that is by executing the program to figure out all the values that needs to be written to <code>cout</code>. That doesn't make any sense, does it? Let's look at the big picture here and consider all programs (or sequences of statements) that have the same characteristic; those that do not take any input but emit output. The question would become: Why doesn't the compiler execute the source code and just emit code that writes the output values? Because of the following reasons: <ul> <li>What if the program takes too much time to execute? What if there was a bug in it that makes it run for an unexpected amount of time? There is even no guarantee that the program will ever halt. What's the compiler supposed to do exactly?</li> <li>Ultimately, the purpose of the compiler is not to execute the source code, but to emit a functionally equivalent native code that is potentially optimized. After all, if the program does not take any input (like your code), you could just, as easily, compile the code and run it once to see the output. The code has to be executed anyway, by the compiler or by running the executable binary, and it will take the same amount of time either way. In both cases, the code has to be compiled and executed. So such optimization adds no real value. However, this is in contrast to templates, which are intended to be reduced to regular code by the compiler at compile-time. In addition, interpretation would be a better execution model for such programs. You don't want to bother with compiling code? Go ahead and use a Python interpreter or whatever interpreter.</li> <li>Implementing such optimization can be difficult or impossible in general. What if the mechanism used to emit output had side-effects that can change future output values? The compiler does not exactly know what happens when you write to <code>cout</code>. The standard output stream can be redirected to something weird. So programs that don't take any input aren't necessarily easier for the compiler to optimize.</li> </ul> That said, simple pieces of code that can be evaluated in a very limited amount of time are indeed evaluated by the compiler. This optimization is called constant folding. Pieces of code that don't have any impact on the state of the program can be eliminated without executing them. For example, if you removed <code>cout<<i<<endl;</code>, the compiler will just optimize away the rest of the code. This is called dead code elimination. Compilers do these optimizations because they can be done by the compiler in a systematic way and because they are very effective on real codebases. But what happens if the <code>is_prime</code> function was a compiler intrinsic? In this case, the compiler would probably have a built-in table of commonly used prime numbers and a very fast implementation of primality testing. You can then expect from the compiler to unroll the <code>loop</code> in the main function several times, maybe even fully, containing only output statements, essentially performing the transformation you're looking for.

C++ constant folding a loop for prime numbers

Q: What is the algorithm for prime numbers?

Algorithm to Find Prime NumberSTEP 1: Define a recursive function that accepts an integer num. STEP 2: Initialize a variable ”i” to 2. STEP 3: If num is equal to 0 or 1, then RETURN false. STEP 4: If num is equal to “i”, then RETURN true.

Q: How do you check a number is prime or not in CPP?

If the number entered by the user is perfectly divisible by i , then is_prime is set to false and the number will not be a prime number. But if the input number is not perfectly divisible by i throughout the entirety of the loop, then it means that the input number is only divisible by 1 and that number itself.

Tags:

c++

compiler-optimization

g++

constantfolding

Having had a look at previous questions 1, 2 , I was wondering if I can force the compiler to perform a constant folding for the following code which prints prime numbers.

#include <iostream>

using namespace std;

inline bool is_prime(int n)
{
    if(n<2)
        return false;
    for(int i=2;i*i<=n;i++)
        if(n%i==0)
            return false;
    return true;
}

int main()
{
    for(int i=0;i<20;i++)
        if(is_prime(i))
            cout<<i<<endl;
    return 0;
}

And I build it via:

g++ -O3 -S main.cpp -o main.asm

As the result are a few:

2,3,5,7,11,13,17,19

I would like to force compiler look at the code similar to

for(int x:{2,3,5,7,11,13,17,19})
    cout<<x<<endl;

cout<<  2 <<endl;
cout<<  3 <<endl;
cout<<  5 <<endl;
cout<<  7 <<endl;
cout<< 11 <<endl;
cout<< 13 <<endl;
cout<< 17 <<endl;
cout<< 19 <<endl;

But reading the assembly shows that none happens.

I even used __builtin_expect but it didn't work.

Is there any way to force the compiler optimizer to read the for loop and use the advantage that the output data is known?

I would like to do it without using template meta-programming.

PS. My real aim is just testing the compiler rather than an efficient method to calculate prime numbers. I just want to show off to my friends how much C++ compiler is powerful.

If separation of is_prime is matter of concern, I put everything inside the main and no difference was observed:

#include <iostream>

using namespace std;

int main()
{
    for(int n=2;n<20;n++)
    {
        bool is_prime=true;
        for(int i=2;i*i<=n;i++)
            if(n%i==0)
            {
                is_prime=false;
                break;
            }
        if(is_prime)
            cout<<n<<endl;
    }
    return 0;
}

There is even a further example that remains no excuse for the compiler:

#include <iostream>
#include <vector>

using namespace std;

int prime_after6000()
{
    int n=6000;
    do
    {
        bool is_prime=true;
        for(int i=2;i*i<=n;i++)
            if(n%i==0)
            {
                is_prime=false;
                break;
            }
        if(is_prime)
            return n;
        n++;
    }while(true);
}

int main()
{
    cout<<prime_after6000()<<endl;
    return 0;
}

assembly:

...
main:
.LFB1907:
    .cfi_startproc
    subq    $8, %rsp
    .cfi_def_cfa_offset 16
    movl    $6000, %esi   ;;;;;;;;;;;;;;;;;;;; bad
.L18:
    testb   $1, %sil
    je  .L15
    movl    $2, %ecx
    jmp .L16
    .p2align 4,,10
    .p2align 3
.L17:
    movl    %esi, %eax
    cltd
    idivl   %ecx
    testl   %edx, %edx
    je  .L15
.L16:
    addl    $1, %ecx
    movl    %ecx, %eax
    imull   %ecx, %eax
    cmpl    %esi, %eax
    jle .L17
    movl    $_ZSt4cout, %edi
    call    _ZNSolsEi
    movq    %rax, %rdi
    call    _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
    xorl    %eax, %eax
    addq    $8, %rsp
    .cfi_remember_state
    .cfi_def_cfa_offset 8
    ret
    .p2align 4,,10
    .p2align 3
.L15:
    .cfi_restore_state
    addl    $1, %esi
    jmp .L18
    .cfi_endproc
.LFE1907:
    .size   main, .-main
    .p2align 4,,15
    .type   _GLOBAL__sub_I__Z15prime_after6000v, @function
_GLOBAL__sub_I__Z15prime_after6000v:
...

474

asked Feb 17 '18 08:02

ar2015

1 Answers

There's a fundamental misunderstanding of compilers here. Let's examine the program you've written very carefully and think about what you're expecting from the compiler to do for you.

The main characteristic of the program is that it doesn't take any input, but it emits output by writing to cout. Keep in mind that the is_prime function is not a compiler intrinsic; the compiler treats it as just another function. This is important and I'll come back to it later.

Now how would the compiler transform the program the way you described? How can it do something like that? That is, how can the compiler transform those two nested loops to a simple sequence of statements that write integers to cout? The only way it can possibly do that is by executing the program to figure out all the values that needs to be written to cout.

That doesn't make any sense, does it? Let's look at the big picture here and consider all programs (or sequences of statements) that have the same characteristic; those that do not take any input but emit output. The question would become: Why doesn't the compiler execute the source code and just emit code that writes the output values? Because of the following reasons:

What if the program takes too much time to execute? What if there was a bug in it that makes it run for an unexpected amount of time? There is even no guarantee that the program will ever halt. What's the compiler supposed to do exactly?
Ultimately, the purpose of the compiler is not to execute the source code, but to emit a functionally equivalent native code that is potentially optimized. After all, if the program does not take any input (like your code), you could just, as easily, compile the code and run it once to see the output. The code has to be executed anyway, by the compiler or by running the executable binary, and it will take the same amount of time either way. In both cases, the code has to be compiled and executed. So such optimization adds no real value. However, this is in contrast to templates, which are intended to be reduced to regular code by the compiler at compile-time. In addition, interpretation would be a better execution model for such programs. You don't want to bother with compiling code? Go ahead and use a Python interpreter or whatever interpreter.
Implementing such optimization can be difficult or impossible in general. What if the mechanism used to emit output had side-effects that can change future output values? The compiler does not exactly know what happens when you write to cout. The standard output stream can be redirected to something weird. So programs that don't take any input aren't necessarily easier for the compiler to optimize.

That said, simple pieces of code that can be evaluated in a very limited amount of time are indeed evaluated by the compiler. This optimization is called constant folding. Pieces of code that don't have any impact on the state of the program can be eliminated without executing them. For example, if you removed cout<<i<<endl;, the compiler will just optimize away the rest of the code. This is called dead code elimination. Compilers do these optimizations because they can be done by the compiler in a systematic way and because they are very effective on real codebases.

But what happens if the is_prime function was a compiler intrinsic? In this case, the compiler would probably have a built-in table of commonly used prime numbers and a very fast implementation of primality testing. You can then expect from the compiler to unroll the loop in the main function several times, maybe even fully, containing only output statements, essentially performing the transformation you're looking for.

answered Oct 20 '22 23:10

Hadi Brais

Related questions
                            
                                NuGet package with a dependency on Visual C++ 2013 Runtime
                            
                                Are the copy constructor and copy assignment of std::runtime_error noexcept?
                            
                                std::list<std::unique_ptr>: empty initializer list vs. default constructor
                            
                                Declaring global const objects in a header file
                            
                                Is there a way to determine if a date/time does not exist?
                            
                                Qt Installer Framework: How to sign the maintenancetool.exe
                            
                                Can C++ attributes be used to replace OpenMP pragmas?
                            
                                How are external symbols resolved?
                            
                                Block-level copying of data between streambuffers
                            
                                Perplexing non-trailing parameter pack behaviour
                            
                                Debugging c++ built with a makefile in Visual Studio Code
                            
                                Debugging in Visual Studio without Pdb files (C++ Access Voilation)
                            
                                How to allow clang-format to format empty classes/structs on one line?
                            
                                Should a using command issue a warning when using a reserved identifier?
                            
                                How to determine whether a C/C++ compiler is installed on Windows? [closed]
                            
                                How to alias a nested template class with variadic parameter packs
                            
                                In new code, why would you use `int` instead of `int_fast16_t` or `int_fast32_t` for a counting variable?
                            
                                c++ new and delete overloading
                            
                                Where should the definition of an explicit specialization of a class template be placed in C++?
                            
                                propagate carry bit between two ASM GCC inline block

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With