Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

This pointer and performance penalty

Tags:

c++

void do_something() {....}

struct dummy
{
   //even I dont call this, compiler will call it fall me, they need it
   void call_do_something() { this->do_something_member(); } 
   void do_something() {....}
};

According what I know, every class or struct in C++ will implicity call this pointer when you want to access the data member or the member function of the class, would this bring performance penalty to C++?

What I mean is

int main()
{
  do_something(); //don't need this pointer
  dummy().call_do_something(); //assume the inline is prefect

  return 0;
}

call_do_something need a this pointer to call the member function, but the C like do_something don't need this pointer, would this pointer bring some performance penalty when compare to the C like function?

I have no meaning to do any micro optimization since it would cause me so much time but always don't bring me good result, I always follow the rule of "measure, don't think". I want to know this pointer would bring performance penalty or not because of curiosity.

like image 564
StereoMatching Avatar asked Nov 30 '12 05:11

StereoMatching


2 Answers

Depends on the situation, but usually, if you've got optimizations turned on, it shouldn't be any more expensive than the C version. The only time you really "pay" for this and other features is when you're using inheritance and virtual functions. Other than that, the compiler is smart enough to not waste time on this in a function you're not using it. Consider the following:

#include <iostream>

void globalDoStuff()
{
    std::cout << "Hello world!\n";
}

struct Dummy
{
    void doStuff() { callGlobalDoStuff(); }
    void callGlobalDoStuff() { globalDoStuff(); }
};

int main()
{
    globalDoStuff();

    Dummy d;
    d.doStuff();
}

Compiled with GCC optimization level O3, I get the following disassembly (cutting the extra junk and just showing main()):

_main:
0000000100000dd0    pushq   %rbp
0000000100000dd1    movq    %rsp,%rbp
0000000100000dd4    pushq   %r14
0000000100000dd6    pushq   %rbx
0000000100000dd7    movq    0x0000025a(%rip),%rbx
0000000100000dde    leaq    0x000000d1(%rip),%r14
0000000100000de5    movq    %rbx,%rdi
0000000100000de8    movq    %r14,%rsi
0000000100000deb    callq   0x100000e62 # bypasses globalDoStuff() and just prints "Hello world!\n"
0000000100000df0    movq    %rbx,%rdi
0000000100000df3    movq    %r14,%rsi
0000000100000df6    callq   0x100000e62 # bypasses globalDoStuff() and just prints "Hello world!\n"
0000000100000dfb    xorl    %eax,%eax
0000000100000dfd    popq    %rbx
0000000100000dfe    popq    %r14
0000000100000e00    popq    %rbp
0000000100000e01    ret

Notice it completely optimized away both the Dummy and globalDoStuff() and just replaced it with the body of globalDoStuff(). globalDoStuff() isn't ever even called, and no Dummy is ever constructed. Instead, the compiler/optimizer replaces that code with two system calls to print out "Hello world!\n" directly. The lesson is that the compiler and optimizer is pretty dang smart, and in general you won't pay for what you don't need.

On the other hand, imagine you have a member function that manipulates a member variable of Dummy. You might think this has a penalty compared to a C function, right? Probably not, because the C function needs a pointer to an object to modify, which, when you think about it, is exactly what the this pointer is to begin with.

So in general you won't pay extra for this in C++ compared to C. Virtual functions may have a (small) penalty as it has to look up the proper function to call, but that's not the case we're considering here.

If you don't turn on optimizations in your compiler, then yeah, sure, there might be a penalty involved, but... why would you compare non-optimized code?

like image 159
Cornstalks Avatar answered Oct 17 '22 19:10

Cornstalks


#include <iostream>
#include <stdint.h>
#include <limits.h>
struct Dummy {
  uint32_t counter;
  Dummy(): counter(0) {}
  void do_something() {
    counter++;
  }
};

uint32_t counter = 0;

void do_something() { counter++; }

int main(int argc, char **argv) {
  Dummy dummy;
  if (argc == 1) {
    for (int i = 0; i < INT_MAX - 1; i++) {
      for (int j = 0; j < 1; j++) {
        do_something();
      }   
    }   
  } else {
    for (int i = 0; i < INT_MAX - 1; i++) {
      for (int j = 0; j < 1; j++) {
        dummy.do_something();
      }   
    }   
    counter = dummy.counter;
  }
  std::cout << counter << std::endl;
  return 0;
}

Average of 10 runs on gcc version 4.3.5 (Debian 4.3.5-4), 64bit, without any flags:

with global counter: 0m15.062s

with dummy object: 0m21.259s

If I modify the code like this as Lyth suggested:

#include <iostream>
#include <stdint.h>
#include <limits.h>

uint32_t counter = 0;

struct Dummy {
  void do_something() {
    counter++;
  }
};


void do_something() { counter++; }

int main(int argc, char **argv) {
  Dummy dummy;
  if (argc == 1) {
    for (int i = 0; i < INT_MAX; i++) {
        do_something();
    }   
  } else {
    for (int i = 0; i < INT_MAX; i++) {
        dummy.do_something();
    }   
  }
  std::cout << counter << std::endl;
  return 0;
}

Then, strangely,

with global counter: 0m12.062s

with dummy object: 0m11.860s

like image 24
perreal Avatar answered Oct 17 '22 20:10

perreal