This was an interview question asked from me a few months back:
Which of the following functions will execute faster, Foo1
or Foo2
?
void Foo(SomeObjectArray** array, unsigned int size)
{
for (int i = 0; i < size; i++)
{
if (((*array) + i) != NULL)
{
((*array) + i)->Operation1();
((*array) + i)->Operation2();
((*array) + i)->Operation3();
((*array) + i)->Operation4();
((*array) + i)->Operation5();
((*array) + i)->Operation6();
}
}
void Foo(SomeObjectArray** array, unsigned int size)
{
for (int i = 0; i < size; i++)
{
if (*((*array) + i) != NULL)
{
Object& obj = *((*array) + i);
obj.Operation1();
obj.Operation2();
obj.Operation3();
obj.Operation4();
obj.Operation5();
obj.Operation6();
}
}
}
Note that this is from memory so I don't remember the code exactly, but the general idea is the same. One function is using the pointer while the other uses a reference (it may have a pointer to an array like in the above code, but I don't remember exactly). I said I am not sure, and will have to profile the code to find out, but if I had to guess Foo2 'may' be faster
. They were not impressed...
This nagged me a couple of times here and there when I ran across code similar to this (or wrote it) and would wonder what I should do in such a case.
I know that...
EDIT: I have changed the code slightly so that now it is checking for a NULL pointer.
It's much faster and memory-efficient to copy a pointer than to copy many of the things a pointer is likely to point to. A reference is stored in as many bytes as required to hold an address on the computer. This often makes reference much smaller than the things they refer to.
Dereferencing, when translated into machine code, can mean different things depending on what you do with the dereferenced object. Accessing a single member of a class through a pointer is typically cheap.
In C++, Reference variables are safer than pointers because reference variables must be initialized and they cannot be changed to refer to something else once they are initialized.
In short: It is almost always more efficient to pass native types (int, float, double) by value than by reference. Not only because a pointer is - in most cases - bigger or as big as the native datatype, but also because it is much harder for the optimizer to optimize reference parameters than value parameters.
I think this is a pretty interesting question, and I saw a lot of speculation about what a compiler might do, but I wanted to take a closer look and find out for sure. So I took e.James' program and ran it through GCC to get the assembly. I should say that I don't really know assembly, so somebody correct me if I'm wrong, but I think we can reasonably infer what's going on. :)
Compiling with -O0
(no optimization)
For Foo1
, we see that the array offset is calculated before each function call:
movl 8(%ebp), %eax
movl (%eax), %edx
movl -4(%ebp), %eax
leal (%edx,%eax), %eax
movl %eax, (%esp)
call __ZN10SomeObject10Operation1Ev
That's for all six method calls, just using different method names. Foo2
has a little bit of setup code to get the reference
movl 8(%ebp), %eax
movl (%eax), %edx
movl -4(%ebp), %eax
leal (%edx,%eax), %eax
movl %eax, -8(%ebp)
And then six of these, which looks like just a stack pointer push and a function call:
movl -8(%ebp), %eax
movl %eax, (%esp)
call __ZN10SomeObject10Operation1Ev
Pretty much what we would expect with no optimization. The output was
Foo1: 18472
Foo2: 17684
Compiling with -O1
(minimal optimization)
Foo1
is a little more efficient, but still adding up the array offset each time:
movl %esi, %eax
addl (%ebx), %eax
movl %eax, (%esp)
call __ZN10SomeObject10Operation1Ev
Foo2
looks saves the value of ebx
(addl (%edi), %ebx
), and then makes these calls:
movl %ebx, (%esp)
call __ZN10SomeObject10Operation1Ev
The timings here were
Foo1: 4979
Foo2: 4977
Compiling with -O2
(moderate optimization)
When compiling with -O2
, GCC just got rid of the whole thing and each call to Foo1
or Foo2
just resulted in adding 594 to dummy
(99 increments * 6 calls = 594 increments):
imull $594, %eax, %eax
addl %eax, _dummy
There were no calls the the object's methods, although those methods remained in the code. As we might expect, the times here were
Foo1: 1
Foo2: 0
I think this tells us that Foo2
is a little faster without optimizations, but really it's a moot point because once it starts optimizing, the compiler is just moving around a couple of longs between the stack and the registers.
Strictly speaking and with no optimizations at all I'd say Foo2 is faster cause Foo1 has to calculate the indirection ptr each time, but thats not gonna happen anywhere.
I'd say the compiler will optimize it and leave it the same.
Looks like there is plenty of room for the compiler, i
and array
don't change for the whole block on each iteration, so it could optimize it putting the pointer into a register, exactly the same it would do for the reference.
Likely the compiler will optimize them to be the same, given the common sub-expressions on each line. No guarantees though.
There aren't any rational conclusions you can reach by arm-chair reasoning like this, with today's compilers and processors. The only way to know is to try it and time it. If someone didn't make that clear in their interview answer, it would be an automatic fail from me.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With