When we create a member function for a class in c++, it has an implicit extra argument that is a pointer to the calling object -- referred as <code>this</code>. Is this true for any function, even if it does not use <code>this</code> pointer. For example, given the class <pre class="prettyprint"><code>class foo { private: int bar; public: int get_one() { return 1; // Not using `this` } int get_bar() { return this->bar; // Using `this` } } </code></pre> Would both the functions (<code>get_one</code> and <code>get_bar</code>) take <code>this</code> as an implicit parameter, even though only one of them actually uses it? It seems like a bit of a waste to do so. Note: I understand the correct thing to do would be to make <code>get_one()</code> static, and that the answer may be dependent on the implementation, but I'm just curious.

<blockquote> ...class in c++, as I understand it, it has an implicit extra argument that is a pointer to the calling object </blockquote> It's important to note that C++ started as C with objects. To that, the <code>this</code> pointer isn't one that is implicitly present within a member function, but instead the member function, when compiled out, needs a way to know what <code>this</code> is referring to; thus the notion of an implicit <code>this</code> pointer to the calling object being passed in. To put it another way, lets take your C++ class and make it a C version: <h3>C++</h3> <pre class="prettyprint"><code>class foo { private: int bar; public: int get_one() { return 1; } int get_bar() { return this->bar; } int get_foo(int i) { return this->bar + i; } }; int main(int argc, char** argv) { foo f; printf("%d\n", f.get_one()); printf("%d\n", f.get_bar()); printf("%d\n", f.get_foo(10)); return 0; } </code></pre> <h3>C</h3> <pre class="prettyprint"><code>typedef struct foo { int bar; } foo; int foo_get_one(foo *this) { return 1; } int foo_get_bar(foo *this) { return this->bar; } int foo_get_foo(int i, foo *this) { return this->bar + i; } int main(int argc, char** argv) { foo f; printf("%d\n", foo_get_one(&f)); printf("%d\n", foo_get_bar(&f)); printf("%d\n", foo_get_foo(10, &f)); return 0; } </code></pre> When the C++ program is compiled and assembled, the <code>this</code> pointer is "added" to the mangled function in order to "know" what object is calling the member function. So <code>foo::get_one</code> might be "mangled" to the C equivalent of <code>foo_get_one(foo *this)</code>, <code>foo::get_bar</code> could be mangled to <code>foo_get_bar(foo *this)</code> and <code>foo::get_foo(int)</code> could be <code>foo_get_foo(int, foo *this)</code>, etc. <blockquote> Would both of the functions (<code>get_one</code> and <code>get_bar</code>) take this as an implicit parameter even though only one <code>get_bar</code> uses it? It seems like a bit of a waste to do so. </blockquote> This is a function of the compiler and if absolutely no optimizations were done, the heuristics might still eliminate the <code>this</code> pointer in a mangled function where an object need not be called (to save stack), but that is highly dependent on the code and how it's being compiled and to what system. More specifically, if the function were one as simple as <code>foo::get_one</code> (merely returning a <code>1</code>), chances are the compiler might just put the constant <code>1</code> in place of the call to <code>object->get_one()</code>, eliminating the need for any references/pointers. Hope that can help.

Semantically the <code>this</code> pointer is always available in a member function - as another user pointed out. That is, you could could later change the function to use it without issue (and, in particular, without the need to recompile calling code in other translation units) or in the case of a <code>virtual</code> function, an overridden version in a subclass could use <code>this</code> even if the base implementation didn't. So the remaining interesting question is what performance impact this imposes, if any. There may be a cost to the caller and/or the callee and the cost may be different when inlined and not inlined. We examine all the permutations below: <h3>Inlined</h3> In the inlined case, the compiler can see both the call site and the function implementation1, and so presumably doesn't need to follow any particular calling convention and so cost of the hidden <code>this</code> pointer should go away. Note also that in this case there is no real distinction between the "callee" code and the "called" code, since they are combined at optimized together at the call site. Let's use the following test code: <pre class="prettyprint"><code>#include <stdio.h> class foo { private: int bar; public: int get_one_member() { return 1; // Not using `this` } }; int get_one_global() { return 2; } int main(int argc, char **) { foo f = foo(); if(argc) { puts("a"); return f.get_one_member(); } else { puts("b"); return get_one_global(); } } </code></pre> Note that the two <code>puts</code> calls are just there to make the branches a bit more different - otherwise the compilers are smart enough to just use a conditional set/move, and so you can't even really separate the inlined bodies of the two functions. All of gcc, icc and clang inline the two calls and generate code that is equivalent for both the member and non-member function, without any trace of the <code>this</code> pointer in the member case. Let's look at the <code>clang</code> code since it's the cleanest: <pre class="prettyprint"><code>main: push rax test edi,edi je 400556 <main+0x16> # this is the member case mov edi,0x4005f4 call 400400 <puts@plt> mov eax,0x1 pop rcx ret # this is the non-member case mov edi,0x4005f6 call 400400 <puts@plt> mov eax,0x2 pop rcx ret </code></pre> Both paths generate the exact same series of 4 instructions leading up to the final <code>ret</code> - two instructions for the <code>puts</code> call, a single instruction to <code>mov</code> the return value of <code>1</code> or <code>2</code> into <code>eax</code>, and a <code>pop rcx</code> to clean up the stack2. So the actual call took exactly one instruction in either case, and there was no <code>this</code> pointer manipulation or passing at all. <h3>Out of line</h3> In the out-of-line costs, supporting the <code>this</code> pointer will actually have some real-but-generally-small costs, at least on the caller side. We use a similar test program, but with the member functions declared out-of-line and with inlining of those functions disabled3: <pre class="prettyprint"><code>class foo { private: int bar; public: int __attribute__ ((noinline)) get_one_member(); }; int foo::get_one_member() { return 1; // Not using `this` } int __attribute__ ((noinline)) get_one_global() { return 2; } int main(int argc, char **) { foo f = foo(); return argc ? f.get_one_member() :get_one_global(); } </code></pre> This test code is somewhat simpler than the last one because it doesn't need the <code>puts</code> call to distinguish the two branches. <h3>Call Site</h3> Let's look at the assembly that <code>gcc</code>4generates for <code>main</code> (i.e., at the call sites for the functions): <pre class="prettyprint"><code>main: test edi,edi jne 400409 <main+0x9> # the global branch jmp 400530 <get_one_global()> # the member branch lea rdi,[rsp-0x18] jmp 400520 <foo::get_one_member()> nop WORD PTR cs:[rax+rax*1+0x0] nop DWORD PTR [rax] </code></pre> Here, both function calls are actually realized using <code>jmp</code> - this is a type of tail-call optimization since they are the last functions called in main, so the <code>ret</code> for the called function actually returns to the caller of <code>main</code> - but here the caller of the member function pays an extra price: <pre class="prettyprint"><code>lea rdi,[rsp-0x18] </code></pre> That's loading the <code>this</code> pointer onto the stack into <code>rdi</code> which receives the first argument which is <code>this</code> for C++ member functions. So there is a (small) extra cost. <h3>Function Body</h3> Now while the call-site pays some cost to pass an (unused) <code>this</code> pointer, in this case at least, the actual function bodies are still equally efficient: <pre class="prettyprint"><code>foo::get_one_member(): mov eax,0x1 ret get_one_global(): mov eax,0x2 ret </code></pre> Both are composed of a single <code>mov</code> and a <code>ret</code>. So the function itself can simply ignore the <code>this</code> value since it isn't used. This raises the question of whether this is true in general - will the function body of a member function that doesn't use <code>this</code> always be compiled as efficiently as an equivalent non-member function? The short answer is no - at least for most modern ABIs that pass arguments in registers. The <code>this</code> pointer takes up a parameter register in the calling convention, so you'll hit the maximum number of register-passed arguments one parameter sooner when compiling a member function. Take for example this function that simply adds its six <code>int</code> parameters together: <pre class="prettyprint"><code>int add6(int a, int b, int c, int d, int e, int f) { return a + b + c + d + e + f; } </code></pre> When compiled as a member function on an x86-64 platform using the SysV ABI, you'll have to pass on register on the stack for the member function, resulting in code like this: <pre class="prettyprint"><code>foo::add6_member(int, int, int, int, int, int): add esi,edx mov eax,DWORD PTR [rsp+0x8] add ecx,esi add ecx,r8d add ecx,r9d add eax,ecx ret </code></pre> Note the read from the stack <code>eax,DWORD PTR [rsp+0x8]</code> which will generally add a few cycles of latency5 and one instruction on gcc6 versus the non-member version, which has no memory reads: <pre class="prettyprint"><code>add6_nonmember(int, int, int, int, int, int): add edi,esi add edx,edi add ecx,edx add ecx,r8d lea eax,[rcx+r9*1] ret </code></pre> Now you won't usually have six or more arguments to a function (especially very short, performance sensitive ones) - but this at least shows that even on the callee code-generation side, this hidden <code>this</code> pointer isn't always free. Note also that while the examples used x86-64 codegen and the SysV ABI, the same basic principles would apply to any ABI that passes some arguments in registers. <hr> 1 Note that this optimization only applies easily to effectively non-virtual functions - since only then can the compiler know the actual function implementation. 2 I guess that's what it's for - this undoes the <code>push rax</code> at the top of the method so that <code>rsp</code> has the correct value on return, but I don't know why the <code>push/pop</code> pair needs to be in there in the first place. Other compilers use different strategies, such as <code>add rsp, 8</code> and <code>sub rsp,8</code>. 3 In practice, you aren't really going to disable inlining like this, but the failure to inline would happen just because the methods are in different compilation units. Because of the way godbolt works, I can't exactly do that, so disabling inlining has the same effect. 4 Oddly, I couldn't get <code>clang</code> to stop inlining either function, either with attribute <code>noinline</code> or with <code>-fno-inline</code>. 5 In fact, often a few cycles more than the usual L1-hit latency of 4 cycles on Intel, due to store-forwarding of the recently written value. 6 In principle, on x86 at least, the one-instruction penalty can be eliminated by using an <code>add</code> with a memory source operand, rather than a <code>mov</code> from memory with a subsequent reg-reg <code>add</code> and in fact clang and icc do exactly that. I don't think one approach dominates though - the <code>gcc</code> approach with a separate <code>mov</code> is better able to move the load off the critical path - initiating it early and then using it only in the last instruction, while the <code>icc</code> approach adds 1 cycle to the critical path involving the <code>mov</code> and the <code>clang</code> approach seems the worst of all - stringing all the adds together into on long dependency chain on <code>eax</code> which ends with the memory read.

Does every c++ member function take `this` as an input implicitly?

Tags:

c++

performance

language-lawyer

this

member-functions

When we create a member function for a class in c++, it has an implicit extra argument that is a pointer to the calling object -- referred as this.

Is this true for any function, even if it does not use this pointer. For example, given the class

class foo
{
private:
    int bar;
public:
    int get_one()
    {
      return 1;  // Not using `this`
    }
    int get_bar()
    {
        return this->bar;  // Using `this`
    }
}

Would both the functions (get_one and get_bar) take this as an implicit parameter, even though only one of them actually uses it?
It seems like a bit of a waste to do so.

_{Note: I understand the correct thing to do would be to make get_one() static, and that the answer may be dependent on the implementation, but I'm just curious.}

679

asked Jan 15 '17 23:01

rtpax

Video Answer

3 Answers

Would both of the functions (get_one and get_bar) take this as an implicit parameter even though only onle get_bar uses it?

Yes (unless the compiler optimizes it away, which still doesn't mean you can call the function without a valid object).

It seems like a bit of a waste to do so

Then why is it a member if it doesn't use any member data? Sometimes, the correct approach is making it a free function in the same namespace.

184

answered Oct 18 '22 08:10

StoryTeller - Unslander Monica

...class in c++, as I understand it, it has an implicit extra argument that is a pointer to the calling object

It's important to note that C++ started as C with objects.

To that, the this pointer isn't one that is implicitly present within a member function, but instead the member function, when compiled out, needs a way to know what this is referring to; thus the notion of an implicit this pointer to the calling object being passed in.

To put it another way, lets take your C++ class and make it a C version:

C++

class foo
{
    private:
        int bar;
    public:
        int get_one()
        {
            return 1;
        }
        
        int get_bar()
        {
            return this->bar;
        }
    
        int get_foo(int i)
        {
            return this->bar + i;
        }
};

int main(int argc, char** argv)
{
    foo f;
    printf("%d\n", f.get_one());
    printf("%d\n", f.get_bar());
    printf("%d\n", f.get_foo(10));
    return 0;
}

C

typedef struct foo
{
    int bar;
} foo;

int foo_get_one(foo *this)
{
    return 1;
}

int foo_get_bar(foo *this)
{
    return this->bar;
}

int foo_get_foo(int i, foo *this)
{
    return this->bar + i;
}

int main(int argc, char** argv)
{
    foo f;
    printf("%d\n", foo_get_one(&f));
    printf("%d\n", foo_get_bar(&f));
    printf("%d\n", foo_get_foo(10, &f));
    return 0;
}

When the C++ program is compiled and assembled, the this pointer is "added" to the mangled function in order to "know" what object is calling the member function.

So foo::get_one might be "mangled" to the C equivalent of foo_get_one(foo *this), foo::get_bar could be mangled to foo_get_bar(foo *this) and foo::get_foo(int) could be foo_get_foo(int, foo *this), etc.

Would both of the functions (get_one and get_bar) take this as an implicit parameter even though only one get_bar uses it? It seems like a bit of a waste to do so.

This is a function of the compiler and if absolutely no optimizations were done, the heuristics might still eliminate the this pointer in a mangled function where an object need not be called (to save stack), but that is highly dependent on the code and how it's being compiled and to what system.

More specifically, if the function were one as simple as foo::get_one (merely returning a 1), chances are the compiler might just put the constant 1 in place of the call to object->get_one(), eliminating the need for any references/pointers.

Hope that can help.

answered Oct 18 '22 08:10

txtechhelp

Semantically the this pointer is always available in a member function - as another user pointed out. That is, you could could later change the function to use it without issue (and, in particular, without the need to recompile calling code in other translation units) or in the case of a virtual function, an overridden version in a subclass could use this even if the base implementation didn't.

So the remaining interesting question is what performance impact this imposes, if any. There may be a cost to the caller and/or the callee and the cost may be different when inlined and not inlined. We examine all the permutations below:

Inlined

In the inlined case, the compiler can see both the call site and the function implementation¹, and so presumably doesn't need to follow any particular calling convention and so cost of the hidden this pointer should go away. Note also that in this case there is no real distinction between the "callee" code and the "called" code, since they are combined at optimized together at the call site.

Let's use the following test code:

#include <stdio.h>

class foo
{
private:
    int bar;
public:
    int get_one_member()
    {
      return 1;  // Not using `this`
    }
};

int get_one_global() {
  return 2;
}

int main(int argc, char **) {
  foo f = foo();
  if(argc) {
    puts("a");
    return f.get_one_member();
  } else {
    puts("b");
    return get_one_global();
  }
}

Note that the two puts calls are just there to make the branches a bit more different - otherwise the compilers are smart enough to just use a conditional set/move, and so you can't even really separate the inlined bodies of the two functions.

All of gcc, icc and clang inline the two calls and generate code that is equivalent for both the member and non-member function, without any trace of the this pointer in the member case. Let's look at the clang code since it's the cleanest:

main:
 push   rax
 test   edi,edi
 je     400556 <main+0x16>
 # this is the member case
 mov    edi,0x4005f4
 call   400400 <puts@plt>
 mov    eax,0x1
 pop    rcx
 ret
 # this is the non-member case    
 mov    edi,0x4005f6
 call   400400 <puts@plt>
 mov    eax,0x2
 pop    rcx
 ret

Both paths generate the exact same series of 4 instructions leading up to the final ret - two instructions for the puts call, a single instruction to mov the return value of 1 or 2 into eax, and a pop rcx to clean up the stack². So the actual call took exactly one instruction in either case, and there was no this pointer manipulation or passing at all.

Out of line

In the out-of-line costs, supporting the this pointer will actually have some real-but-generally-small costs, at least on the caller side.

We use a similar test program, but with the member functions declared out-of-line and with inlining of those functions disabled³:

class foo
{
private:
    int bar;
public:
    int __attribute__ ((noinline)) get_one_member();
};

int foo::get_one_member() 
{
   return 1;  // Not using `this`
}

int __attribute__ ((noinline)) get_one_global() {
  return 2;
}

int main(int argc, char **) {
  foo f = foo();
  return argc ? f.get_one_member() :get_one_global();
}

This test code is somewhat simpler than the last one because it doesn't need the puts call to distinguish the two branches.

Call Site

Let's look at the assembly that gcc⁴generates for main (i.e., at the call sites for the functions):

main:
 test   edi,edi
 jne    400409 <main+0x9>
 # the global branch
 jmp    400530 <get_one_global()>
 # the member branch
 lea    rdi,[rsp-0x18]
 jmp    400520 <foo::get_one_member()>
 nop    WORD PTR cs:[rax+rax*1+0x0]
 nop    DWORD PTR [rax]

Here, both function calls are actually realized using jmp - this is a type of tail-call optimization since they are the last functions called in main, so the ret for the called function actually returns to the caller of main - but here the caller of the member function pays an extra price:

lea    rdi,[rsp-0x18]

That's loading the this pointer onto the stack into rdi which receives the first argument which is this for C++ member functions. So there is a (small) extra cost.

Function Body

Now while the call-site pays some cost to pass an (unused) this pointer, in this case at least, the actual function bodies are still equally efficient:

foo::get_one_member():
 mov    eax,0x1
 ret    

get_one_global():
 mov    eax,0x2
 ret

Both are composed of a single mov and a ret. So the function itself can simply ignore the this value since it isn't used.

This raises the question of whether this is true in general - will the function body of a member function that doesn't use this always be compiled as efficiently as an equivalent non-member function?

The short answer is no - at least for most modern ABIs that pass arguments in registers. The this pointer takes up a parameter register in the calling convention, so you'll hit the maximum number of register-passed arguments one parameter sooner when compiling a member function.

Take for example this function that simply adds its six int parameters together:

int add6(int a, int b, int c, int d, int e, int f) {
  return a + b + c + d + e + f;
}

When compiled as a member function on an x86-64 platform using the SysV ABI, you'll have to pass on register on the stack for the member function, resulting in code like this:

foo::add6_member(int, int, int, int, int, int):
 add    esi,edx
 mov    eax,DWORD PTR [rsp+0x8]
 add    ecx,esi
 add    ecx,r8d
 add    ecx,r9d
 add    eax,ecx
 ret

Note the read from the stack eax,DWORD PTR [rsp+0x8] which will generally add a few cycles of latency⁵ and one instruction on gcc⁶ versus the non-member version, which has no memory reads:

add6_nonmember(int, int, int, int, int, int):
 add    edi,esi
 add    edx,edi
 add    ecx,edx
 add    ecx,r8d
 lea    eax,[rcx+r9*1]
 ret

Now you won't usually have six or more arguments to a function (especially very short, performance sensitive ones) - but this at least shows that even on the callee code-generation side, this hidden this pointer isn't always free.

Note also that while the examples used x86-64 codegen and the SysV ABI, the same basic principles would apply to any ABI that passes some arguments in registers.

¹ Note that this optimization only applies easily to effectively non-virtual functions - since only then can the compiler know the actual function implementation.

² I guess that's what it's for - this undoes the push rax at the top of the method so that rsp has the correct value on return, but I don't know why the push/pop pair needs to be in there in the first place. Other compilers use different strategies, such as add rsp, 8 and sub rsp,8.

³ In practice, you aren't really going to disable inlining like this, but the failure to inline would happen just because the methods are in different compilation units. Because of the way godbolt works, I can't exactly do that, so disabling inlining has the same effect.

⁴ Oddly, I couldn't get clang to stop inlining either function, either with attribute noinline or with -fno-inline.

⁵ In fact, often a few cycles more than the usual L1-hit latency of 4 cycles on Intel, due to store-forwarding of the recently written value.

⁶ In principle, on x86 at least, the one-instruction penalty can be eliminated by using an add with a memory source operand, rather than a mov from memory with a subsequent reg-reg add and in fact clang and icc do exactly that. I don't think one approach dominates though - the gcc approach with a separate mov is better able to move the load off the critical path - initiating it early and then using it only in the last instruction, while the icc approach adds 1 cycle to the critical path involving the mov and the clang approach seems the worst of all - stringing all the adds together into on long dependency chain on eax which ends with the memory read.

answered Oct 18 '22 06:10

BeeOnRope

Related questions
                            
                                Is boost::interprocess threadsafe?
                            
                                Anti-hacking a game - best practices, suggestions
                            
                                LabVIEW, C++ DLL, and IMAQ Images
                            
                                Failed writing body in libcurl
                            
                                Why is it not using my overloaded operator for ++?
                            
                                How to instruct the compiler to generate an alias for a virtual function?
                            
                                Passing a 2d dynamic array to a function in C++
                            
                                Can't open .rc files in Visual Studio for editing, app compiles fine
                            
                                Does an empty value pack expansion match a type pack or optional type parameter?
                            
                                In C++11, is … considered an operator?
                            
                                How to change the style of dialog box items into what appears in test mode?
                            
                                How can I call a set of variadic base class constructors based on tagged argument packs?
                            
                                wrapping specialised c++ template class with swig
                            
                                unique_ptr autocomplete in eclipse
                            
                                Writing WebRTC (AudioTrackSinkInterface) raw audio to disc
                            
                                Nesting std::containers of movable objects
                            
                                Why is the size of this struct 24?
                            
                                Initialize large two dimensional array in C++
                            
                                How do I link to a library with Code::Blocks?
                            
                                Why would concurrency using std::async be faster than using std::thread?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With