Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unused function changes performances

While trying to estimate the difference of performances between push_back and std::inserter I run into a very strange performance issue.

Let's consider the following code :

#include <vector>
using container = std::vector<int>;
const int size  = 1000000;
const int count = 1000;

#ifdef MYOWNFLAG
void foo(std::insert_iterator<container> ist)
{
    for(int i=0; i<size; ++i)
        *ist++ = i;
}
#endif

void bar(container& cnt)
{
    for(int i=0; i<size; ++i)
        cnt.push_back(i);
}
int main()
{
    container cnt;
    for (int i=0; i<count; ++i)
    {
        cnt.clear();
        bar(cnt);
    }
    return 0;
}

In this case, no mather whether or not MYOWNFLAG is defined, the function foo isn't called. However the value of this flag has an impact on the perfomances:

$ g++ -g -pipe -march=native -pedantic -std=c++11 -W -Wall -Wextra -Werror -O3 -o bin/inserter src/inserter.cc && time ./bin/inserter
./bin/inserter  4,73s user 0,00s system 100% cpu 4,728 total

$ g++ -g -pipe -march=native -pedantic -std=c++11 -W -Wall -Wextra -Werror -O3 -o bin/inserter src/inserter.cc -DMYOWNFLAG && time ./bin/inserter
./bin/inserter  2,09s user 0,00s system 99% cpu 2,094 total

Note that if I change the protopyte of foo to use std::back_insert_iterator I get a similar performance as if I had not set the flag.

What's going on with the compiler's optimisations ???

EDIT

I use gcc 4.9.2 20150304 (prerelease)

Repoduced

  • reproduced by stefan on ideone
  • reproduced by me on another machine with gcc 4.9.2
  • not reproduced by me on another machine with gcc 4.6.3 and flag -std=c++0x
like image 799
Amxx Avatar asked Mar 19 '15 20:03

Amxx


People also ask

What happens if you have too many functions in a function?

Each function that you create has a memory footprint. While this footprint is usually small, having too many functions within a function app can lead to slower startup of your app on new instances. It also means that the overall memory usage of your function app might be higher.

How do I find unused code analysis rules in Visual Studio?

To find unused members with a Code Analysis Ruleset, from the Visual Studio menu select File -> New -> File… -> General -> Code Analysis Rule Set. Uncheck all the rules. There are many rules we don’t care about right now – and some we probably won’t ever care about.

How do I remove unused public members in Visual Studio Code?

So, we can safely eliminate unused public members in this program. We just can’t find them through Visual Studio rulesets. To find unused members with a Code Analysis Ruleset, from the Visual Studio menu select File -> New -> File… -> General -> Code Analysis Rule Set.

How do I find unused members with a code analysis ruleset?

To find unused members with a Code Analysis Ruleset, from the Visual Studio menu select File -> New -> File… -> General -> Code Analysis Rule Set. Uncheck all the rules.


1 Answers

First I will show you magical trick how to achieve this without garbage function. Then I will show you why garbage function works. So trick:

Original ineffective (note my machine about twice faster):

g++ -g -pipe -march=native -pedantic -std=c++11 -W -Wall -Wextra -Werror -O3 -o bin/inserter src/inserter.cc --param inline-unit-growth=200 && time ./bin/inserter
real    0m2.197s
user    0m2.200s
sys     0m0.000s

Now goes trick (your define is still inactive):

g++ -g -pipe -march=native -pedantic -std=c++11 -W -Wall -Wextra -Werror -O3 -o bin/inserter src/inserter.cc --param inline-min-speedup=2 && time ./bin/inserter
real    0m1.114s
user    0m1.100s
sys 0m0.010s

Note: difference is in strange-looking argument --param inline-min-speedup=2

Now I will briefly outline investigation:

  1. What is difference between fast and slow? In slow version we do have ineffective call to emplace_back_aux inside bar(), that is magically inlined when your foo is uncommented. So we may conclude, that bar is very hot and inlining is crushial here. And most probably all this bug is about inlining.

  2. Now with option -fdump-ipa-inline-details lets look at inlining dumps. You will see different time/size considerations. It is hard to read and I don't want to paste here all details. But general result of studying this information: GCC thinks, that growth in module size (in percents) is not worth estimated speedup.

  3. What to do? Two possibilities:

    3.1. Either increase module size and overall speedup estimations with unused foo code, that is using correct types like insert_iterator to call emplace_back and move ratio to be bigger and hit inlining limit (note that this way is very unstable -- everything may explode in other compiler versions with improved inlining algorithms, and you also need to be really lucky to guess the code to work).

    3.2. Or move inlining limit. What did I said to GCC with parameter provided is "consider for inlining even big functions with less speedup please".

That is. There are a lot of other parameters inside GCC and other tricks that you may do with them.

like image 86
Konstantin Vladimirov Avatar answered Sep 20 '22 17:09

Konstantin Vladimirov