I made a simple program in c++ to compare performance between two approaches - pass by value and pass by reference. Actually pass by value performed better than pass by reference. The conclusion should be that passing by value require fewer clock-cycles (instructions) I would be really glad if someone could explain in detail why pass by value require fewer clock-cycles. <pre class="prettyprint"><code>#include <iostream> #include <stdlib.h> #include <time.h> using namespace std; void function(int *ptr); void function2(int val); int main() { int nmbr = 5; clock_t start, stop; start = clock(); for (long i = 0; i < 1000000000; i++) { function(&nmbr); //function2(nmbr); } stop = clock(); cout << "time: " << stop - start; return 0; } /** * pass by reference */ void function(int *ptr) { *ptr *= 5; } /** * pass by value */ void function2(int val) { val *= 5; } </code></pre>

A good way to find out why there are any differences is to check the disassembly. Here are the results I got on my machine with Visual Studio 2012. With optimization flags, both functions generate the same code: <pre class="prettyprint"><code>009D1270 57 push edi 009D1271 FF 15 D4 30 9D 00 call dword ptr ds:[9D30D4h] 009D1277 8B F8 mov edi,eax 009D1279 FF 15 D4 30 9D 00 call dword ptr ds:[9D30D4h] 009D127F 8B 0D 48 30 9D 00 mov ecx,dword ptr ds:[9D3048h] 009D1285 2B C7 sub eax,edi 009D1287 50 push eax 009D1288 E8 A3 04 00 00 call std::operator<<<std::char_traits<char> > (09D1730h) 009D128D 8B C8 mov ecx,eax 009D128F FF 15 2C 30 9D 00 call dword ptr ds:[9D302Ch] 009D1295 33 C0 xor eax,eax 009D1297 5F pop edi 009D1298 C3 ret </code></pre> This is basically equivalent to: <pre class="prettyprint"><code>int main () { clock_t start, stop ; start = clock () ; stop = clock () ; cout << "time: " << stop - start ; return 0 ; } </code></pre> Without optimization flags, you will probably get different results. function (no optimizations): <pre class="prettyprint"><code>00114890 55 push ebp 00114891 8B EC mov ebp,esp 00114893 81 EC C0 00 00 00 sub esp,0C0h 00114899 53 push ebx 0011489A 56 push esi 0011489B 57 push edi 0011489C 8D BD 40 FF FF FF lea edi,[ebp-0C0h] 001148A2 B9 30 00 00 00 mov ecx,30h 001148A7 B8 CC CC CC CC mov eax,0CCCCCCCCh 001148AC F3 AB rep stos dword ptr es:[edi] 001148AE 8B 45 08 mov eax,dword ptr [ptr] 001148B1 8B 08 mov ecx,dword ptr [eax] 001148B3 6B C9 05 imul ecx,ecx,5 001148B6 8B 55 08 mov edx,dword ptr [ptr] 001148B9 89 0A mov dword ptr [edx],ecx 001148BB 5F pop edi 001148BC 5E pop esi 001148BD 5B pop ebx 001148BE 8B E5 mov esp,ebp 001148C0 5D pop ebp 001148C1 C3 ret </code></pre> function2 (no optimizations) <pre class="prettyprint"><code>00FF4850 55 push ebp 00FF4851 8B EC mov ebp,esp 00FF4853 81 EC C0 00 00 00 sub esp,0C0h 00FF4859 53 push ebx 00FF485A 56 push esi 00FF485B 57 push edi 00FF485C 8D BD 40 FF FF FF lea edi,[ebp-0C0h] 00FF4862 B9 30 00 00 00 mov ecx,30h 00FF4867 B8 CC CC CC CC mov eax,0CCCCCCCCh 00FF486C F3 AB rep stos dword ptr es:[edi] 00FF486E 8B 45 08 mov eax,dword ptr [val] 00FF4871 6B C0 05 imul eax,eax,5 00FF4874 89 45 08 mov dword ptr [val],eax 00FF4877 5F pop edi 00FF4878 5E pop esi 00FF4879 5B pop ebx 00FF487A 8B E5 mov esp,ebp 00FF487C 5D pop ebp 00FF487D C3 ret </code></pre> Why is pass by value faster (in the no optimization case)? Well, <code>function()</code> has two extra <code>mov</code> operations. Let's take a look at the first extra <code>mov</code> operation: <pre class="prettyprint"><code>001148AE 8B 45 08 mov eax,dword ptr [ptr] 001148B1 8B 08 mov ecx,dword ptr [eax] 001148B3 6B C9 05 imul ecx,ecx,5 </code></pre> Here we are dereferencing the pointer. In <code>function2 ()</code>, we already have the value, so we avoid this step. We first move the address of the pointer into register eax. Then we move the value of the pointer into register ecx. Finally, we multiply the value by five. Let's look at the second extra <code>mov</code> operation: <pre class="prettyprint"><code>001148B3 6B C9 05 imul ecx,ecx,5 001148B6 8B 55 08 mov edx,dword ptr [ptr] 001148B9 89 0A mov dword ptr [edx],ecx </code></pre> Now we are moving backwards. We have just finished multiplying the value by 5, and we need to place the value back into the memory address. Because <code>function2 ()</code> does not have to deal with referencing and dereferencing a pointer, it gets to skip these two extra <code>mov</code> operations.

Overhead with passing by reference: <ul> <li>each access needs a dereference, i.e., there is one more memory read</li> </ul> Overhead with passing by value: <ul> <li>the value needs to be copied on stack or into registers</li> </ul> For small objects, such as an integer, passing by value will be faster. For bigger objects (for example a large structure), the copying would create too much overhead so passing by reference will be faster.

Pass by value faster than pass by reference

Tags:

c++

pass-by-reference

pass-by-value

I made a simple program in c++ to compare performance between two approaches - pass by value and pass by reference. Actually pass by value performed better than pass by reference.

The conclusion should be that passing by value require fewer clock-cycles (instructions)

I would be really glad if someone could explain in detail why pass by value require fewer clock-cycles.

#include <iostream> #include <stdlib.h> #include <time.h>  using namespace std;  void function(int *ptr); void function2(int val);  int main() {     int nmbr = 5;     clock_t start, stop;    start = clock();    for (long i = 0; i < 1000000000; i++) {        function(&nmbr);        //function2(nmbr);    }    stop = clock();     cout << "time: " << stop - start;     return 0; }  /** * pass by reference */ void function(int *ptr) {     *ptr *= 5; }  /** * pass by value */ void function2(int val) {    val *= 5; }

702

asked Apr 03 '14 14:04

Björn Hallström

2 Answers

A good way to find out why there are any differences is to check the disassembly. Here are the results I got on my machine with Visual Studio 2012.

With optimization flags, both functions generate the same code:

009D1270 57                   push        edi   009D1271 FF 15 D4 30 9D 00    call        dword ptr ds:[9D30D4h]   009D1277 8B F8                mov         edi,eax   009D1279 FF 15 D4 30 9D 00    call        dword ptr ds:[9D30D4h]   009D127F 8B 0D 48 30 9D 00    mov         ecx,dword ptr ds:[9D3048h]   009D1285 2B C7                sub         eax,edi   009D1287 50                   push        eax   009D1288 E8 A3 04 00 00       call        std::operator<<<std::char_traits<char> > (09D1730h)   009D128D 8B C8                mov         ecx,eax   009D128F FF 15 2C 30 9D 00    call        dword ptr ds:[9D302Ch]   009D1295 33 C0                xor         eax,eax   009D1297 5F                   pop         edi   009D1298 C3                   ret

This is basically equivalent to:

int main () {     clock_t start, stop ;     start = clock () ;     stop = clock () ;     cout << "time: " << stop - start ;     return 0 ; }

Without optimization flags, you will probably get different results.

function (no optimizations):

00114890 55                   push        ebp   00114891 8B EC                mov         ebp,esp   00114893 81 EC C0 00 00 00    sub         esp,0C0h   00114899 53                   push        ebx   0011489A 56                   push        esi   0011489B 57                   push        edi   0011489C 8D BD 40 FF FF FF    lea         edi,[ebp-0C0h]   001148A2 B9 30 00 00 00       mov         ecx,30h   001148A7 B8 CC CC CC CC       mov         eax,0CCCCCCCCh   001148AC F3 AB                rep stos    dword ptr es:[edi]   001148AE 8B 45 08             mov         eax,dword ptr [ptr]   001148B1 8B 08                mov         ecx,dword ptr [eax]   001148B3 6B C9 05             imul        ecx,ecx,5   001148B6 8B 55 08             mov         edx,dword ptr [ptr]   001148B9 89 0A                mov         dword ptr [edx],ecx   001148BB 5F                   pop         edi   001148BC 5E                   pop         esi   001148BD 5B                   pop         ebx   001148BE 8B E5                mov         esp,ebp   001148C0 5D                   pop         ebp   001148C1 C3                   ret

function2 (no optimizations)

00FF4850 55                   push        ebp   00FF4851 8B EC                mov         ebp,esp   00FF4853 81 EC C0 00 00 00    sub         esp,0C0h   00FF4859 53                   push        ebx   00FF485A 56                   push        esi   00FF485B 57                   push        edi   00FF485C 8D BD 40 FF FF FF    lea         edi,[ebp-0C0h]   00FF4862 B9 30 00 00 00       mov         ecx,30h   00FF4867 B8 CC CC CC CC       mov         eax,0CCCCCCCCh   00FF486C F3 AB                rep stos    dword ptr es:[edi]   00FF486E 8B 45 08             mov         eax,dword ptr [val]   00FF4871 6B C0 05             imul        eax,eax,5   00FF4874 89 45 08             mov         dword ptr [val],eax   00FF4877 5F                   pop         edi   00FF4878 5E                   pop         esi   00FF4879 5B                   pop         ebx   00FF487A 8B E5                mov         esp,ebp   00FF487C 5D                   pop         ebp   00FF487D C3                   ret

Why is pass by value faster (in the no optimization case)?

Well, function() has two extra mov operations. Let's take a look at the first extra mov operation:

001148AE 8B 45 08             mov         eax,dword ptr [ptr]   001148B1 8B 08                mov         ecx,dword ptr [eax]   001148B3 6B C9 05             imul        ecx,ecx,5

Here we are dereferencing the pointer. In function2 (), we already have the value, so we avoid this step. We first move the address of the pointer into register eax. Then we move the value of the pointer into register ecx. Finally, we multiply the value by five.

Let's look at the second extra mov operation:

001148B3 6B C9 05             imul        ecx,ecx,5   001148B6 8B 55 08             mov         edx,dword ptr [ptr]   001148B9 89 0A                mov         dword ptr [edx],ecx

Now we are moving backwards. We have just finished multiplying the value by 5, and we need to place the value back into the memory address.

Because function2 () does not have to deal with referencing and dereferencing a pointer, it gets to skip these two extra mov operations.

answered Sep 23 '22 23:09

jliv902

Overhead with passing by reference:

each access needs a dereference, i.e., there is one more memory read

Overhead with passing by value:

the value needs to be copied on stack or into registers

For small objects, such as an integer, passing by value will be faster. For bigger objects (for example a large structure), the copying would create too much overhead so passing by reference will be faster.

answered Sep 21 '22 23:09

green lantern

Related questions
                            
                                array vs vector vs list
                            
                                Floating point format for std::ostream
                            
                                Pros & Cons of putting all code in Header files in C++?
                            
                                Syntax error with std::numeric_limits::max
                            
                                What are potential dangers when using boost::shared_ptr?
                            
                                How can I make the map::find operation case insensitive?
                            
                                New to Xcode can't open files in c++?
                            
                                Does integer overflow cause undefined behavior because of memory corruption?
                            
                                Check double variable if it contains an integer, and not floating point
                            
                                Why does C need "struct" keyword and not C++?
                            
                                Is the destructor called if the constructor throws an exception?
                            
                                Display QImage with QtGui
                            
                                Best way to declare an interface in C++11
                            
                                Difference in behavior while using dynamic_cast with reference and pointers
                            
                                Algorithm for finding the smallest power of two that's greater or equal to a given value [duplicate]
                            
                                Template tuple - calling a function on each element
                            
                                How to execute a functor or a lambda in a given thread in Qt, GCD-style?
                            
                                C++ floating point to integer type conversions
                            
                                is there any way to disable compiler optimisation for a specific line of code?
                            
                                How expensive is it to dereference a pointer?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With