I was getting through "Exceptional C++" by Herb Sutter lately, and I have serious doubts about a particular recommendation he gives in Item 6 - Temporary Objects. He offers to find unnecessary temporary objects in the following code: <pre class="prettyprint"><code>string FindAddr(list<Employee> emps, string name) { for (list<Employee>::iterator i = emps.begin(); i != emps.end(); i++) { if( *i == name ) { return i->addr; } } return ""; } </code></pre> As one of the example, he recommends to precompute the value of <code>emps.end()</code> before the loop, since there is a temporary object created on every iteration: <blockquote> For most containers (including list), calling end() returns a temporary object that must be constructed and destroyed. Because the value will not change, recomputing (and reconstructing and redestroying) it on every loop iteration is both needlessly inefficient and unaesthetic. The value should be computed only once, stored in a local object, and reused. </blockquote> And he suggests replacing by the following: <pre class="prettyprint"><code>list<Employee>::const_iterator end(emps.end()); for (list<Employee>::const_iterator i = emps.begin(); i != end; ++i) </code></pre> For me, this is unnecessary complication. Even if one replaces ugly type declarations with compact <code>auto</code>, he still gets two lines of code instead of one. Even more, he has this <code>end</code> variable in the outer scope. I was sure modern compilers will optimize this piece of code anyway, because I'm actually using <code>const_iterator</code> here and it is easy to check whether the loop content is accessing the container somehow. Compilers got smarter within the last 13 years, right? Anyway, I will prefer the first version with <code>i != emps.end()</code> in most cases, where I'm not so much worried about performance. But I want to know for sure, whether this is a kind of construction I could rely on a compiler to optimize? Update Thanks for your suggestions on how to make this useless code better. Please note, my question is about compiler, not programming techniques. The only relevant answers for now are from NPE and Ellioh.

UPD: The book you are speaking about has been published in 1999, unless I'm mistaking. That's 14 years ago, and in modern programming 14 years is a lot of time. Many recommendations that were good and reliable in 1999, may be completely obsolete by now. Though my answer is about a single compiler and a single platform, there is also a more general idea. Caring about extra variables, reusing a return value of trivial methods and similar tricks of old C++ is a step back towards the C++ of 1990s. Trivial methods like <code>end()</code> should be inlined quite well, and the result of inlining should be optimized as a part of the code it is called from. 99% situations do not require manual actions such as creating an <code>end</code> variable at all. Such things should be done only if: <ol> <li>You KNOW that on some of the compilers/platforms you should run on the code is not optimized well.</li> <li>It has become a bottleneck in your program ("avoid premature optimization").</li> </ol> I've looked at what is generated by 64-bit g++: <pre class="prettyprint"><code>gcc version 4.6.3 20120918 (prerelease) (Ubuntu/Linaro 4.6.3-10ubuntu1) </code></pre> Initially I thought that with optimizations on it should be ok and there should be no difference between two versions. But looks like things are strange: the version you considered non-optimal is actually better. I think, the moral is: there is no reason to try being smarter than a compiler. Let's see both versions. <pre class="prettyprint"><code>#include <list> using namespace std; int main() { list<char> l; l.push_back('a'); for(list<char>::iterator i=l.begin(); i != l.end(); i++) ; return 0; } int main1() { list<char> l; l.push_back('a'); list<char>::iterator e=l.end(); for(list<char>::iterator i=l.begin(); i != e; i++) ; return 0; } </code></pre> Then we should compile this with optimizations on (I use 64-bit <code>g++</code>, you may try your compiler) and disassemble <code>main</code> and <code>main1</code>: For <code>main</code>: <pre class="prettyprint"><code>(gdb) disas main Dump of assembler code for function main(): 0x0000000000400650 <+0>: push %rbx 0x0000000000400651 <+1>: mov $0x18,%edi 0x0000000000400656 <+6>: sub $0x20,%rsp 0x000000000040065a <+10>: lea 0x10(%rsp),%rbx 0x000000000040065f <+15>: mov %rbx,0x10(%rsp) 0x0000000000400664 <+20>: mov %rbx,0x18(%rsp) 0x0000000000400669 <+25>: callq 0x400630 <_Znwm@plt> 0x000000000040066e <+30>: cmp $0xfffffffffffffff0,%rax 0x0000000000400672 <+34>: je 0x400678 <main()+40> 0x0000000000400674 <+36>: movb $0x61,0x10(%rax) 0x0000000000400678 <+40>: mov %rax,%rdi 0x000000000040067b <+43>: mov %rbx,%rsi 0x000000000040067e <+46>: callq 0x400610 <_ZNSt8__detail15_List_node_base7_M_hookEPS0_@plt> 0x0000000000400683 <+51>: mov 0x10(%rsp),%rax 0x0000000000400688 <+56>: cmp %rbx,%rax 0x000000000040068b <+59>: je 0x400698 <main()+72> 0x000000000040068d <+61>: nopl (%rax) 0x0000000000400690 <+64>: mov (%rax),%rax 0x0000000000400693 <+67>: cmp %rbx,%rax 0x0000000000400696 <+70>: jne 0x400690 <main()+64> 0x0000000000400698 <+72>: mov %rbx,%rdi 0x000000000040069b <+75>: callq 0x400840 <std::list<char, std::allocator<char> >::~list()> 0x00000000004006a0 <+80>: add $0x20,%rsp 0x00000000004006a4 <+84>: xor %eax,%eax 0x00000000004006a6 <+86>: pop %rbx 0x00000000004006a7 <+87>: retq </code></pre> Look at the commands located at 0x0000000000400683-0x000000000040068b. That's the loop body and it seems to be perfectly optimized: <pre class="prettyprint"><code> 0x0000000000400690 <+64>: mov (%rax),%rax 0x0000000000400693 <+67>: cmp %rbx,%rax 0x0000000000400696 <+70>: jne 0x400690 <main()+64> </code></pre> For <code>main1</code>: <pre class="prettyprint"><code>(gdb) disas main1 Dump of assembler code for function main1(): 0x00000000004007b0 <+0>: push %rbp 0x00000000004007b1 <+1>: mov $0x18,%edi 0x00000000004007b6 <+6>: push %rbx 0x00000000004007b7 <+7>: sub $0x18,%rsp 0x00000000004007bb <+11>: mov %rsp,%rbx 0x00000000004007be <+14>: mov %rsp,(%rsp) 0x00000000004007c2 <+18>: mov %rsp,0x8(%rsp) 0x00000000004007c7 <+23>: callq 0x400630 <_Znwm@plt> 0x00000000004007cc <+28>: cmp $0xfffffffffffffff0,%rax 0x00000000004007d0 <+32>: je 0x4007d6 <main1()+38> 0x00000000004007d2 <+34>: movb $0x61,0x10(%rax) 0x00000000004007d6 <+38>: mov %rax,%rdi 0x00000000004007d9 <+41>: mov %rsp,%rsi 0x00000000004007dc <+44>: callq 0x400610 <_ZNSt8__detail15_List_node_base7_M_hookEPS0_@plt> 0x00000000004007e1 <+49>: mov (%rsp),%rdi 0x00000000004007e5 <+53>: cmp %rbx,%rdi 0x00000000004007e8 <+56>: je 0x400818 <main1()+104> 0x00000000004007ea <+58>: mov %rdi,%rax 0x00000000004007ed <+61>: nopl (%rax) 0x00000000004007f0 <+64>: mov (%rax),%rax 0x00000000004007f3 <+67>: cmp %rbx,%rax 0x00000000004007f6 <+70>: jne 0x4007f0 <main1()+64> 0x00000000004007f8 <+72>: mov (%rdi),%rbp 0x00000000004007fb <+75>: callq 0x4005f0 <_ZdlPv@plt> 0x0000000000400800 <+80>: cmp %rbx,%rbp 0x0000000000400803 <+83>: je 0x400818 <main1()+104> 0x0000000000400805 <+85>: nopl (%rax) 0x0000000000400808 <+88>: mov %rbp,%rdi 0x000000000040080b <+91>: mov (%rdi),%rbp 0x000000000040080e <+94>: callq 0x4005f0 <_ZdlPv@plt> 0x0000000000400813 <+99>: cmp %rbx,%rbp 0x0000000000400816 <+102>: jne 0x400808 <main1()+88> 0x0000000000400818 <+104>: add $0x18,%rsp 0x000000000040081c <+108>: xor %eax,%eax 0x000000000040081e <+110>: pop %rbx 0x000000000040081f <+111>: pop %rbp 0x0000000000400820 <+112>: retq </code></pre> The code for the loop is similar, it is: <pre class="prettyprint"><code> 0x00000000004007f0 <+64>: mov (%rax),%rax 0x00000000004007f3 <+67>: cmp %rbx,%rax 0x00000000004007f6 <+70>: jne 0x4007f0 <main1()+64> </code></pre> But there is alot of extra stuff around the loop. Apparently, extra code has made the things WORSE.

Performance of pIter != cont.end() in for loop

Tags:

I was getting through "Exceptional C++" by Herb Sutter lately, and I have serious doubts about a particular recommendation he gives in Item 6 - Temporary Objects.

He offers to find unnecessary temporary objects in the following code:

string FindAddr(list<Employee> emps, string name)  {   for (list<Employee>::iterator i = emps.begin(); i != emps.end(); i++)   {     if( *i == name )     {       return i->addr;     }   }   return ""; }

As one of the example, he recommends to precompute the value of emps.end() before the loop, since there is a temporary object created on every iteration:

For most containers (including list), calling end() returns a temporary object that must be constructed and destroyed. Because the value will not change, recomputing (and reconstructing and redestroying) it on every loop iteration is both needlessly inefficient and unaesthetic. The value should be computed only once, stored in a local object, and reused.

And he suggests replacing by the following:

list<Employee>::const_iterator end(emps.end()); for (list<Employee>::const_iterator i = emps.begin(); i != end; ++i)

For me, this is unnecessary complication. Even if one replaces ugly type declarations with compact auto, he still gets two lines of code instead of one. Even more, he has this end variable in the outer scope.

I was sure modern compilers will optimize this piece of code anyway, because I'm actually using const_iterator here and it is easy to check whether the loop content is accessing the container somehow. Compilers got smarter within the last 13 years, right?

Anyway, I will prefer the first version with i != emps.end() in most cases, where I'm not so much worried about performance. But I want to know for sure, whether this is a kind of construction I could rely on a compiler to optimize?

Update

Thanks for your suggestions on how to make this useless code better. Please note, my question is about compiler, not programming techniques. The only relevant answers for now are from NPE and Ellioh.

400

asked Mar 15 '13 13:03

Mikhail

1 Answers

UPD: The book you are speaking about has been published in 1999, unless I'm mistaking. That's 14 years ago, and in modern programming 14 years is a lot of time. Many recommendations that were good and reliable in 1999, may be completely obsolete by now. Though my answer is about a single compiler and a single platform, there is also a more general idea.

Caring about extra variables, reusing a return value of trivial methods and similar tricks of old C++ is a step back towards the C++ of 1990s. Trivial methods like end() should be inlined quite well, and the result of inlining should be optimized as a part of the code it is called from. 99% situations do not require manual actions such as creating an end variable at all. Such things should be done only if:

You KNOW that on some of the compilers/platforms you should run on the code is not optimized well.
It has become a bottleneck in your program ("avoid premature optimization").

I've looked at what is generated by 64-bit g++:

gcc version 4.6.3 20120918 (prerelease) (Ubuntu/Linaro 4.6.3-10ubuntu1)

Initially I thought that with optimizations on it should be ok and there should be no difference between two versions. But looks like things are strange: the version you considered non-optimal is actually better. I think, the moral is: there is no reason to try being smarter than a compiler. Let's see both versions.

#include <list>  using namespace std;  int main() {   list<char> l;   l.push_back('a');    for(list<char>::iterator i=l.begin(); i != l.end(); i++)       ;    return 0; }  int main1() {   list<char> l;   l.push_back('a');   list<char>::iterator e=l.end();   for(list<char>::iterator i=l.begin(); i != e; i++)       ;    return 0; }

Then we should compile this with optimizations on (I use 64-bit g++, you may try your compiler) and disassemble main and main1:

For main:

(gdb) disas main Dump of assembler code for function main():    0x0000000000400650 <+0>: push   %rbx    0x0000000000400651 <+1>: mov    $0x18,%edi    0x0000000000400656 <+6>: sub    $0x20,%rsp    0x000000000040065a <+10>:    lea    0x10(%rsp),%rbx    0x000000000040065f <+15>:    mov    %rbx,0x10(%rsp)    0x0000000000400664 <+20>:    mov    %rbx,0x18(%rsp)    0x0000000000400669 <+25>:    callq  0x400630 <_Znwm@plt>    0x000000000040066e <+30>:    cmp    $0xfffffffffffffff0,%rax    0x0000000000400672 <+34>:    je     0x400678 <main()+40>    0x0000000000400674 <+36>:    movb   $0x61,0x10(%rax)    0x0000000000400678 <+40>:    mov    %rax,%rdi    0x000000000040067b <+43>:    mov    %rbx,%rsi    0x000000000040067e <+46>:    callq  0x400610 <_ZNSt8__detail15_List_node_base7_M_hookEPS0_@plt>    0x0000000000400683 <+51>:    mov    0x10(%rsp),%rax    0x0000000000400688 <+56>:    cmp    %rbx,%rax    0x000000000040068b <+59>:    je     0x400698 <main()+72>    0x000000000040068d <+61>:    nopl   (%rax)    0x0000000000400690 <+64>:    mov    (%rax),%rax    0x0000000000400693 <+67>:    cmp    %rbx,%rax    0x0000000000400696 <+70>:    jne    0x400690 <main()+64>    0x0000000000400698 <+72>:    mov    %rbx,%rdi    0x000000000040069b <+75>:    callq  0x400840 <std::list<char, std::allocator<char> >::~list()>    0x00000000004006a0 <+80>:    add    $0x20,%rsp    0x00000000004006a4 <+84>:    xor    %eax,%eax    0x00000000004006a6 <+86>:    pop    %rbx    0x00000000004006a7 <+87>:    retq

Look at the commands located at 0x0000000000400683-0x000000000040068b. That's the loop body and it seems to be perfectly optimized:

   0x0000000000400690 <+64>:    mov    (%rax),%rax    0x0000000000400693 <+67>:    cmp    %rbx,%rax    0x0000000000400696 <+70>:    jne    0x400690 <main()+64>

For main1:

(gdb) disas main1 Dump of assembler code for function main1():    0x00000000004007b0 <+0>: push   %rbp    0x00000000004007b1 <+1>: mov    $0x18,%edi    0x00000000004007b6 <+6>: push   %rbx    0x00000000004007b7 <+7>: sub    $0x18,%rsp    0x00000000004007bb <+11>:    mov    %rsp,%rbx    0x00000000004007be <+14>:    mov    %rsp,(%rsp)    0x00000000004007c2 <+18>:    mov    %rsp,0x8(%rsp)    0x00000000004007c7 <+23>:    callq  0x400630 <_Znwm@plt>    0x00000000004007cc <+28>:    cmp    $0xfffffffffffffff0,%rax    0x00000000004007d0 <+32>:    je     0x4007d6 <main1()+38>    0x00000000004007d2 <+34>:    movb   $0x61,0x10(%rax)    0x00000000004007d6 <+38>:    mov    %rax,%rdi    0x00000000004007d9 <+41>:    mov    %rsp,%rsi    0x00000000004007dc <+44>:    callq  0x400610 <_ZNSt8__detail15_List_node_base7_M_hookEPS0_@plt>    0x00000000004007e1 <+49>:    mov    (%rsp),%rdi    0x00000000004007e5 <+53>:    cmp    %rbx,%rdi    0x00000000004007e8 <+56>:    je     0x400818 <main1()+104>    0x00000000004007ea <+58>:    mov    %rdi,%rax    0x00000000004007ed <+61>:    nopl   (%rax)    0x00000000004007f0 <+64>:    mov    (%rax),%rax    0x00000000004007f3 <+67>:    cmp    %rbx,%rax    0x00000000004007f6 <+70>:    jne    0x4007f0 <main1()+64>    0x00000000004007f8 <+72>:    mov    (%rdi),%rbp    0x00000000004007fb <+75>:    callq  0x4005f0 <_ZdlPv@plt>    0x0000000000400800 <+80>:    cmp    %rbx,%rbp    0x0000000000400803 <+83>:    je     0x400818 <main1()+104>    0x0000000000400805 <+85>:    nopl   (%rax)    0x0000000000400808 <+88>:    mov    %rbp,%rdi    0x000000000040080b <+91>:    mov    (%rdi),%rbp    0x000000000040080e <+94>:    callq  0x4005f0 <_ZdlPv@plt>    0x0000000000400813 <+99>:    cmp    %rbx,%rbp    0x0000000000400816 <+102>:   jne    0x400808 <main1()+88>    0x0000000000400818 <+104>:   add    $0x18,%rsp    0x000000000040081c <+108>:   xor    %eax,%eax    0x000000000040081e <+110>:   pop    %rbx    0x000000000040081f <+111>:   pop    %rbp    0x0000000000400820 <+112>:   retq

The code for the loop is similar, it is:

   0x00000000004007f0 <+64>:    mov    (%rax),%rax    0x00000000004007f3 <+67>:    cmp    %rbx,%rax    0x00000000004007f6 <+70>:    jne    0x4007f0 <main1()+64>

But there is alot of extra stuff around the loop. Apparently, extra code has made the things WORSE.

130

answered Oct 30 '22 10:10

Ellioh

Related questions
                            
                                Variable naming code style in IntelliJ
                            
                                Does the C++ standard guarantee that std::uintmax_t can hold all values of std::size_t?
                            
                                Twitter Bootstrap Tabs active class toggle not working
                            
                                Add onclick event to SVG element
                            
                                Find and Deletes Duplicates in List of Tuples in C#
                            
                                signalR : /signalr/hubs is not generated
                            
                                Updating UI from events using asyc await
                            
                                What is different between ER Diagram and Database Schema?
                            
                                Python augmenting multiple variables inline
                            
                                Python: find contour lines from matplotlib.pyplot.contour()
                            
                                Is it possible to change IntelliJ's code generation template for equals() and hashCode()
                            
                                How to extract and download all images from a website using beautifulSoup?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With