Why isn't G++ compiler treating these two functions the same way?

Tags:

performance

I have an array A with zeros and ones. I want to find the sum of all numbers in A. I want to test two functions:

First function

void test1(int curIndex){     if(curIndex == size) return;     test1(curIndex+1);     s+=A[curIndex]; }

Second function

void test2(int curIndex){     if(curIndex == size) return;     s+=A[curIndex];     test2(curIndex+1); }

I used the PAPI library to count the number of instructions, here is the entire experiment:

#include <iostream> #include <fstream> #include "Statistics.h"  using namespace std;   int size; int *A; int s;  void test3(int curIndex){     if(curIndex == size) return;     test3(curIndex+1);     s+=A[curIndex]; }  int main(int argc, char* argv[]){      size = atoi(argv[1]);     if(argc!=2){         cout<<"type ./executable size{odd integer}"<<endl;         return 1;     }     if(size%2!=1){         cout<<"size must be an odd number"<<endl;         return 1;     }     A = new int[size];     int i;     for(i=0;i<size;i++){         if(i%2==0){             A[i] = false;         }         else{             A[i] = true;         }     }      Statistics stat(1);     stat.start();     test3(0);     stat.stop();     stat.printWithHelp();     cout<<s<<endl;      return 0; }

Here is the Statistics.h file:

#ifndef TRIPLETPARSER_STATISTICS_H #define TRIPLETPARSER_STATISTICS_H  #include <time.h> #include <unistd.h> #include <fstream> #include <papi.h> #include <iostream> #include <iostream>  #define BILLION  1000000000LL  using namespace std;  class Statistics {  private:     timespec s, e;     /*     PAPI_BR_CN  Conditional branch instructions     PAPI_BR_INS     Branch instructions     PAPI_BR_MSP     Conditional branch instructions mispredicted     PAPI_BR_NTK     Conditional branch instructions not taken     PAPI_BR_PRC     Conditional branch instructions correctly predicted     PAPI_BR_TKN     Conditional branch instructions taken     PAPI_BR_UCN     Unconditional branch instructions     PAPI_BRU_IDL    Cycles branch units are idle     PAPI_BTAC_M     Branch target address cache misses      PAPI_TLB_DM     Data translation lookaside buffer misses     */     int events[10];  // , PAPI_L2_TCA,PAPI_L3_TCM,PAPI_L3_TCA PAPI_BR_CN, PAPI_BR_PRC}; //type of events we are interested in     int num_hwcntrs;  //total amount of events stored in 'events' array     long long values[10];     long long counters[10];      void handle_error(int err){         std::cerr << "PAPI error: " << err << std::endl;     }  public:     Statistics(int papi){         for(size_t i = 0; i< 10; i++)             counters[i]=0.0;          switch(papi){             case 0:                 num_hwcntrs = 0;                 break;             case 1:                 num_hwcntrs = 6;                 events[0] = PAPI_L2_TCA;                 events[1] = PAPI_L3_TCA;                 events[2] = PAPI_L3_TCM;                 events[3] = PAPI_TOT_INS;                 events[4] = PAPI_BR_INS;                 events[5] = PAPI_BR_MSP;                 break;         }      }      void start(){          for(size_t i = 0; i< 10; i++)             counters[i]=0.0;          if (num_hwcntrs != 0 && PAPI_start_counters(events, num_hwcntrs) != PAPI_OK)             handle_error(1);     }       void start(float ratio){          if (num_hwcntrs != 0 && PAPI_start_counters(events, num_hwcntrs) != PAPI_OK)             handle_error(1);     }       void stop(){         if (num_hwcntrs != 0 && PAPI_stop_counters(values, num_hwcntrs) != PAPI_OK)             handle_error(1);         update();     }      void stop(float ratio){         if (num_hwcntrs != 0 && PAPI_stop_counters(values, num_hwcntrs) != PAPI_OK)             handle_error(1);         update();     }       void update(){         for(size_t i = 0; i < num_hwcntrs; i++)             counters[i] += values[i];     }      void print(){         for(int i=0;i<num_hwcntrs;i++)             std::cout << counters[i] << "\t";         //cout<<"L2 cache miss ratio: "<<counters[1]/(double)counters[0]<<endl;         //cout<<"L3 cache miss ratio: "<<counters[3]/(double)counters[2]<<endl;     }      void printWithHelp(){         cout<<"L2 accesses: "<<counters[0]<<endl;         cout<<"L2 miss/access ratio: "<<(double)counters[1]/counters[0]<<endl;         cout<<"L3 accesses: "<<counters[1]<<endl;         cout<<"L3 misses: "<<counters[2]<<endl;         cout<<"L3 miss/access ratio: "<<(double)counters[2]/counters[1]<<endl;         cout<<"Instructions: "<<counters[3]<<endl;         cout<<"Branches: "<<counters[4]<<endl;         cout<<"Branch mispredictions: "<<counters[5]<<endl;         cout<<"Branch miss/predict ratio: "<<(double)counters[5]/counters[4]<<endl;     }      void print(float avg_ratio){         for (int i = 0; i<num_hwcntrs; i++)             std::cout << (double)(avg_ratio*counters[i]) << "\t";     }  };  #endif //TRIPLETPARSER_STATISTICS_H

This is the output that I get from the first function when the size of A is 111,111

L2 accesses: 24126 L2 miss/access ratio: 0.131559 L3 accesses: 3174 L3 misses: 587 L3 miss/access ratio: 0.18494 Instructions: 1022776 Branches: 178113 Branch mispredictions: 6976 Branch miss/predict ratio: 0.0391661

This is the output that I get from the second function when the size of A is 111,111

L2 accesses: 7090 L2 miss/access ratio: 0.163752 L3 accesses: 1161 L3 misses: 507 L3 miss/access ratio: 0.436693 Instructions: 555860 Branches: 111189 Branch mispredictions: 25 Branch miss/predict ratio: 0.000224842

Why the difference in the results? The instructions reduce by half, the branch mispredictions are almost eliminated. What is happening here?

592

asked Sep 19 '16 12:09

1 Answers

Your second function is tail-recursive. That means the compiler can optimize it to something like:

void test2(int curIndex){   while(true)   {     if(curIndex == size) return;     s+=A[curIndex];     curIndex = curIndex + 1;   } }

That reduces the number of instructions significantly. It also reduces the number of stack frames needed to (at most) one. As a result, it uses a lot less memory, which results in the reduction of cache misses.

The compiler is not able to make this optimization for the first function.

UPDATE: Some people have asked why the compiler cannot make that optimization on the first function.

Let's start by observing the function is not tail-recursive. A function is tail recursive if the very last thing that happens is a recursive call to the same function, followed by returning the result of that recursive call (if any).

Clearly, that is not the case for the first function, s+=A[curIndex]; is executed after the recursive call.

So then people have asked why the compiler cannot turn the first function into the second one.

The question is "why does G++ not have this feature?" The answer to that question is always the same. Features are unimplemented by default; G++ does not have that feature because no one designed, implemented and shipped the feature to customers.

That should be the end of it, but of course people will want to know why nobody designed, implemented and tested this feature. Well, maybe nobody thought of doing it. But more importantly, the feature would be far from trivial.

First of all, the compiler would have to understand that

test1(curIndex+1); s+=A[curIndex];

and

s+=A[curIndex]; test1(curIndex+1);

are equivalent. That is a non trivial observation, given that from a mechanical point of view they are not equivalent! Indeed, the first one effectively loops from the end of the array to the start, whereas the second one loops from the start to the end. Is that the same? It yields the same result when A is an int* (and s in an int), but it doesn't in other cases (e.g. when A is a double* and s is a double). Do we expect a compiler to be that smart?

So here we have a potential feature with a high cost to implement. But the cost may be worth it, if the benefit is high. Is the benefit high? I would guess that this happens very little in real code, i.e. developers are likely to write the second form anyway. So there you have it: an expensive feature with little benefit. IMHO, compiler developers are wise to spend their valuable time on more useful features.

178

answered Oct 06 '22 02:10

Kris Vandermotten

Related questions
                            
                                Performance Difference Between C and C++ Style File IO
                            
                                Modern C++ idiom for allocating / deallocating an I/O buffer
                            
                                How to declare constexpr C string?
                            
                                What happens if I write less than 12 bytes to a 12 byte buffer?
                            
                                Mixing debug and release library/binary - bad practice?
                            
                                How to include the stdafx.h from the root directory?
                            
                                most difficult-to-find errors in c++ [closed]
                            
                                Visual Studio Code: Take Input From User
                            
                                Passing const shared_ptr<T>& versus just shared_ptr<T> as parameter
                            
                                Equivalent of IllegalArgumentException of Java in C++
                            
                                How fast are lambda functions in GCC
                            
                                Why AccessViolationException cannot be caught by .NET4.0
                            
                                when to use JNIEXPORT and JNICALL in Android NDK?
                            
                                Why doesn't C++ support strongly typed ellipsis?
                            
                                When the dereference operator (*) is overloaded, is the usage of *this affected?
                            
                                Why can't templates be declared in a function?
                            
                                Fast fixed point pow, log, exp and sqrt
                            
                                Can I have gdb break on read/write from an address? [duplicate]
                            
                                Running mstest from command line against Visual Studio 2012 native C++ tests
                            
                                Where are pointers in C++ stored, on the stack or in the heap?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why isn't G++ compiler treating these two functions the same way?

Tags:

c++

performance

jsguy

People also ask

1 Answers

Kris Vandermotten

Recent Activity

Donate For Us