<p>Consider the following code (<code>p</code> is of type <code>unsigned char*</code> and <code>bitmap->width</code> is of some integer type, exactly which is unknown and depends on which version of some external library we're using):</p> <pre class="prettyprint"><code>for (unsigned x = 0; x < static_cast<unsigned>(bitmap->width); ++x) { *p++ = 0xAA; *p++ = 0xBB; *p++ = 0xCC; } </code></pre> <p><sub><s>Is it worth optimizing it [..]</s></sub></p> <p>Could there be a case where this could yield more efficient results by writing:</p> <pre class="prettyprint"><code>unsigned width(static_cast<unsigned>(bitmap->width)); for (unsigned x = 0; x < width; ++x) { *p++ = 0xAA; *p++ = 0xBB; *p++ = 0xCC; } </code></pre> <p>... or is this trivial for the compiler to optimize?</p> <p><sub><s>What would you consider to be "better" code?</s></sub></p> <p><sub><em>Note from editor (Ike): for those wondering about the strikeout text, the original question, as phrased, was dangerously close to off-topic territory and was very close to being closed in spite of positive feedback. These have been stricken out. Yet please do not punish the answerers who addressed these stricken sections of the question.</em></sub></p>

<p>At first glance, I thought the compiler could generate equivalent assembly for both versions with optimization flags activated. When I checked it, I was surprised to see the result:</p> <h3>Source <code>unoptimized.cpp</code> </h3> <p>note: this code is not meant to be executed.</p> <pre class="prettyprint"><code>struct bitmap_t { long long width; } bitmap; int main(int argc, char** argv) { for (unsigned x = 0 ; x < static_cast<unsigned>(bitmap.width) ; ++x) { argv[x][0] = '\0'; } return 0; } </code></pre> <h3>Source <code>optimized.cpp</code> </h3> <p>note: this code is not meant to be executed.</p> <pre class="prettyprint"><code>struct bitmap_t { long long width; } bitmap; int main(int argc, char** argv) { const unsigned width = static_cast<unsigned>(bitmap.width); for (unsigned x = 0 ; x < width ; ++x) { argv[x][0] = '\0'; } return 0; } </code></pre> <hr> <h3>Compilation</h3> <ul> <li><code>$ g++ -s -O3 unoptimized.cpp</code></li> <li><code>$ g++ -s -O3 optimized.cpp</code></li> </ul> <hr> <h3>Assembly (unoptimized.s)</h3> <pre class="prettyprint"><code> .file "unoptimized.cpp" .text .p2align 4,,15 .globl main .type main, @function main: .LFB0: .cfi_startproc .cfi_personality 0x3,__gxx_personality_v0 movl bitmap(%rip), %eax testl %eax, %eax je .L2 xorl %eax, %eax .p2align 4,,10 .p2align 3 .L3: mov %eax, %edx addl $1, %eax movq (%rsi,%rdx,8), %rdx movb $0, (%rdx) cmpl bitmap(%rip), %eax jb .L3 .L2: xorl %eax, %eax ret .cfi_endproc .LFE0: .size main, .-main .globl bitmap .bss .align 8 .type bitmap, @object .size bitmap, 8 bitmap: .zero 8 .ident "GCC: (GNU) 4.4.7 20120313 (Red Hat 4.4.7-16)" .section .note.GNU-stack,"",@progbits </code></pre> <h3>Assembly (optimized.s)</h3> <pre class="prettyprint"><code> .file "optimized.cpp" .text .p2align 4,,15 .globl main .type main, @function main: .LFB0: .cfi_startproc .cfi_personality 0x3,__gxx_personality_v0 movl bitmap(%rip), %eax testl %eax, %eax je .L2 subl $1, %eax leaq 8(,%rax,8), %rcx xorl %eax, %eax .p2align 4,,10 .p2align 3 .L3: movq (%rsi,%rax), %rdx addq $8, %rax cmpq %rcx, %rax movb $0, (%rdx) jne .L3 .L2: xorl %eax, %eax ret .cfi_endproc .LFE0: .size main, .-main .globl bitmap .bss .align 8 .type bitmap, @object .size bitmap, 8 bitmap: .zero 8 .ident "GCC: (GNU) 4.4.7 20120313 (Red Hat 4.4.7-16)" .section .note.GNU-stack,"",@progbits </code></pre> <h3>diff</h3> <pre class="prettyprint"><code>$ diff -uN unoptimized.s optimized.s --- unoptimized.s 2015-11-24 16:11:55.837922223 +0000 +++ optimized.s 2015-11-24 16:12:02.628922941 +0000 @@ -1,4 +1,4 @@ - .file "unoptimized.cpp" + .file "optimized.cpp" .text .p2align 4,,15 .globl main @@ -10,16 +10,17 @@ movl bitmap(%rip), %eax testl %eax, %eax je .L2 + subl $1, %eax + leaq 8(,%rax,8), %rcx xorl %eax, %eax .p2align 4,,10 .p2align 3 .L3: - mov %eax, %edx - addl $1, %eax - movq (%rsi,%rdx,8), %rdx + movq (%rsi,%rax), %rdx + addq $8, %rax + cmpq %rcx, %rax movb $0, (%rdx) - cmpl bitmap(%rip), %eax - jb .L3 + jne .L3 .L2: xorl %eax, %eax ret </code></pre> <hr> <p>The generated assembly for the optimized version does actually load (<code>lea</code>) the <code>width</code> constant unlike the unoptimized version which computes the <code>width</code> offset at each iteration (<code>movq</code>).</p> <p>When I'll get time, I eventually post some benchmark on that. Good question.</p>

In C++, should I bother to cache variables, or let the compiler do the optimization? (Aliasing)

Tags:

c++

performance

optimization

caching

strict-aliasing

Consider the following code (p is of type unsigned char* and bitmap->width is of some integer type, exactly which is unknown and depends on which version of some external library we're using):

for (unsigned x = 0;  x < static_cast<unsigned>(bitmap->width);  ++x) {     *p++ = 0xAA;     *p++ = 0xBB;     *p++ = 0xCC; }

_{Is it worth optimizing it [..]}

Could there be a case where this could yield more efficient results by writing:

unsigned width(static_cast<unsigned>(bitmap->width)); for (unsigned x = 0;  x < width;  ++x) {     *p++ = 0xAA;     *p++ = 0xBB;     *p++ = 0xCC; }

... or is this trivial for the compiler to optimize?

_{What would you consider to be "better" code?}

_{Note from editor (Ike): for those wondering about the strikeout text, the original question, as phrased, was dangerously close to off-topic territory and was very close to being closed in spite of positive feedback. These have been stricken out. Yet please do not punish the answerers who addressed these stricken sections of the question.}

553

asked Nov 24 '15 16:11

Yaron Cohen-Tal

2 Answers

At first glance, I thought the compiler could generate equivalent assembly for both versions with optimization flags activated. When I checked it, I was surprised to see the result:

Source `unoptimized.cpp`

note: this code is not meant to be executed.

struct bitmap_t {     long long width; } bitmap;  int main(int argc, char** argv) {     for (unsigned x = 0 ; x < static_cast<unsigned>(bitmap.width) ; ++x)     {         argv[x][0] = '\0';     }     return 0; }

Source `optimized.cpp`

note: this code is not meant to be executed.

struct bitmap_t {     long long width; } bitmap;  int main(int argc, char** argv) {     const unsigned width = static_cast<unsigned>(bitmap.width);     for (unsigned x = 0 ; x < width ; ++x)     {         argv[x][0] = '\0';     }     return 0; }

Compilation

$ g++ -s -O3 unoptimized.cpp
$ g++ -s -O3 optimized.cpp

Assembly (unoptimized.s)

    .file   "unoptimized.cpp"     .text     .p2align 4,,15 .globl main     .type   main, @function main: .LFB0:     .cfi_startproc     .cfi_personality 0x3,__gxx_personality_v0     movl    bitmap(%rip), %eax     testl   %eax, %eax     je  .L2     xorl    %eax, %eax     .p2align 4,,10     .p2align 3 .L3:     mov %eax, %edx     addl    $1, %eax     movq    (%rsi,%rdx,8), %rdx     movb    $0, (%rdx)     cmpl    bitmap(%rip), %eax     jb  .L3 .L2:     xorl    %eax, %eax     ret     .cfi_endproc .LFE0:     .size   main, .-main .globl bitmap     .bss     .align 8     .type   bitmap, @object     .size   bitmap, 8 bitmap:     .zero   8     .ident  "GCC: (GNU) 4.4.7 20120313 (Red Hat 4.4.7-16)"     .section    .note.GNU-stack,"",@progbits

Assembly (optimized.s)

    .file   "optimized.cpp"     .text     .p2align 4,,15 .globl main     .type   main, @function main: .LFB0:     .cfi_startproc     .cfi_personality 0x3,__gxx_personality_v0     movl    bitmap(%rip), %eax     testl   %eax, %eax     je  .L2     subl    $1, %eax     leaq    8(,%rax,8), %rcx     xorl    %eax, %eax     .p2align 4,,10     .p2align 3 .L3:     movq    (%rsi,%rax), %rdx     addq    $8, %rax     cmpq    %rcx, %rax     movb    $0, (%rdx)     jne .L3 .L2:     xorl    %eax, %eax     ret     .cfi_endproc .LFE0:     .size   main, .-main .globl bitmap     .bss     .align 8     .type   bitmap, @object     .size   bitmap, 8 bitmap:     .zero   8     .ident  "GCC: (GNU) 4.4.7 20120313 (Red Hat 4.4.7-16)"     .section    .note.GNU-stack,"",@progbits

diff

$ diff -uN unoptimized.s optimized.s --- unoptimized.s   2015-11-24 16:11:55.837922223 +0000 +++ optimized.s 2015-11-24 16:12:02.628922941 +0000 @@ -1,4 +1,4 @@ -   .file   "unoptimized.cpp" +   .file   "optimized.cpp"     .text     .p2align 4,,15  .globl main @@ -10,16 +10,17 @@     movl    bitmap(%rip), %eax     testl   %eax, %eax     je  .L2 +   subl    $1, %eax +   leaq    8(,%rax,8), %rcx     xorl    %eax, %eax     .p2align 4,,10     .p2align 3  .L3: -   mov %eax, %edx -   addl    $1, %eax -   movq    (%rsi,%rdx,8), %rdx +   movq    (%rsi,%rax), %rdx +   addq    $8, %rax +   cmpq    %rcx, %rax     movb    $0, (%rdx) -   cmpl    bitmap(%rip), %eax -   jb  .L3 +   jne .L3  .L2:     xorl    %eax, %eax     ret

The generated assembly for the optimized version does actually load (lea) the width constant unlike the unoptimized version which computes the width offset at each iteration (movq).

When I'll get time, I eventually post some benchmark on that. Good question.

answered Sep 21 '22 20:09

YSC

There is actually insufficient information from your code snippet to be able to tell, and the one thing that I can think of is aliasing. From our point of view, it's pretty clear that you don't want p and bitmap to point to the same location in memory, but the compiler doesn't know that and (because p is of type char*) the compiler has to make this code work even if p and bitmap overlap.

This means in this case that if the loop changes bitmap->width through the pointer p then that has to be seen when re-reading bitmap->width later on, which in turn means that storing it in a local variable would be illegal.

That being said, I believe some compilers will actually sometimes generate two versions of the same code (I have seen circumstantial evidence of this, but never directly sought out information on what the compiler is doing in this case), and quickly check if the pointers alias and run the faster code if it determines it's okay to.

That being said, I stand by my comment about simply measuring the performance of the two versions, my money is on not seeing any consistent performance difference between the two versions of the code.

In my opinion, questions like these are okay if your purpose is to learn about compiler optimization theories and techniques, but is a waste of time (a useless micro-optimization) if your end goal here is to make the program run faster.

answered Sep 21 '22 20:09

SirGuy

Related questions
                            
                                Will strlen be calculated multiple times if used in a loop condition?
                            
                                Removing item from vector, while in C++11 range 'for' loop?
                            
                                Is ‘int main;’ a valid C/C++ program?
                            
                                What exception classes are in the standard C++ library
                            
                                Are vectors passed to functions by value or by reference in C++
                            
                                What does ## (double hash) do in a preprocessor directive?
                            
                                How do I assign an alias to a function name in C++?
                            
                                Fixing Segmentation faults in C++
                            
                                Cannot find Dumpbin.exe
                            
                                Understanding the meaning of the term and the concept - RAII (Resource Acquisition is Initialization)
                            
                                Reset C int array to zero : the fastest way?
                            
                                How to shuffle a std::vector?
                            
                                Reason to Pass a Pointer by Reference in C++?
                            
                                Erasing elements from a vector
                            
                                Weighted random numbers
                            
                                undefined reference to boost::system::system_category() when compiling
                            
                                Different floating point result with optimization enabled - compiler bug?
                            
                                What are transparent comparators?
                            
                                Downcasting shared_ptr<Base> to shared_ptr<Derived>?
                            
                                How can I solve the error LNK2019: unresolved external symbol - function?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

In C++, should I bother to cache variables, or let the compiler do the optimization? (Aliasing)

Tags:

c++

performance

optimization

caching

strict-aliasing

Yaron Cohen-Tal

People also ask

2 Answers

Source `unoptimized.cpp`

Source `optimized.cpp`

Compilation

Assembly (unoptimized.s)

Assembly (optimized.s)

diff

YSC

SirGuy

Recent Activity

Donate For Us

In C++, should I bother to cache variables, or let the compiler do the optimization? (Aliasing)

Tags:

c++

performance

optimization

caching

strict-aliasing

Yaron Cohen-Tal

People also ask

2 Answers

Source unoptimized.cpp

Source optimized.cpp

Compilation

Assembly (unoptimized.s)

Assembly (optimized.s)

diff

YSC

SirGuy

Related questions

Recent Activity

Donate For Us

Source `unoptimized.cpp`

Source `optimized.cpp`