At the GoingNative event, during the Interactive Panel on Day2 at the 9 minute mark, Chandler Carruth says: <blockquote> Pointers create aliasing problems. They slow down your binaries not speed them up. </blockquote> What does this mean? Can this be illustrated with a (simple) example?

Aliasing affects performance by preventing the compiler from doing certain optimizations. For example: <pre class="prettyprint"><code>void foo(int *array,int *size,int *value) { for(int i=0;i<*size;++i) { array[i] = 2 * *value; } } </code></pre> Looking at this code you might expect that the compiler could load <code>*value</code> once outside the loop and then set every element in the array to that value very quickly. But this isn't the case due to aliasing. Because <code>*value</code> could be an alias for an element of the array it could change on any given iteration. Therefore the code has to load the value every single iteration, resulting in a potentially large slowdown. If the variables could not alias then the above code would be equivalent to the following: <pre class="prettyprint"><code>void foo(int *array,int size,int value) { for(int i=0;i<size;++i) { array[i] = 2 * value; } } </code></pre> Using LLVM's online demo to get the generated code, here are the different results: 1) With aliasing <pre class="prettyprint"><code>foo: # @foo .cfi_startproc # BB#0: cmpl $0, (%rsi) jle .LBB0_3 # BB#1: xorl %eax, %eax .align 16, 0x90 .LBB0_2: # %.lr.ph # =>This Inner Loop Header: Depth=1 movl (%rdx), %ecx addl %ecx, %ecx movl %ecx, (%rdi,%rax,4) incq %rax cmpl (%rsi), %eax jl .LBB0_2 .LBB0_3: # %._crit_edge ret .size foo, .Ltmp1-foo .cfi_endproc .Leh_func_end0: </code></pre> <hr> 2) Without aliasing <pre class="prettyprint"><code>foo: # @foo .cfi_startproc # BB#0: testl %esi, %esi jle .LBB0_3 # BB#1: # %.lr.ph addl %edx, %edx .align 16, 0x90 .LBB0_2: # =>This Inner Loop Header: Depth=1 movl %edx, (%rdi) addq $4, %rdi decl %esi jne .LBB0_2 .LBB0_3: # %._crit_edge ret .size foo, .Ltmp1-foo .cfi_endproc .Leh_func_end0: </code></pre> You can see that the version with aliasing has to do more work in the loop body (between the labels <code>LBB0_2</code> and <code>LBB0_3</code>).

What is aliasing and how does it affect performance?

2 Answers

Aliasing affects performance by preventing the compiler from doing certain optimizations. For example:

void foo(int *array,int *size,int *value) {     for(int i=0;i<*size;++i) {         array[i] = 2 * *value;     } }

Looking at this code you might expect that the compiler could load *value once outside the loop and then set every element in the array to that value very quickly. But this isn't the case due to aliasing. Because *value could be an alias for an element of the array it could change on any given iteration. Therefore the code has to load the value every single iteration, resulting in a potentially large slowdown.

If the variables could not alias then the above code would be equivalent to the following:

void foo(int *array,int size,int value) {     for(int i=0;i<size;++i) {         array[i] = 2 * value;     } }

Using LLVM's online demo to get the generated code, here are the different results:

1) With aliasing

foo:                                    # @foo     .cfi_startproc # BB#0:     cmpl    $0, (%rsi)     jle .LBB0_3 # BB#1:     xorl    %eax, %eax     .align  16, 0x90 .LBB0_2:                                # %.lr.ph                                         # =>This Inner Loop Header: Depth=1     movl    (%rdx), %ecx     addl    %ecx, %ecx     movl    %ecx, (%rdi,%rax,4)     incq    %rax     cmpl    (%rsi), %eax     jl  .LBB0_2 .LBB0_3:                                # %._crit_edge     ret     .size   foo, .Ltmp1-foo     .cfi_endproc .Leh_func_end0:

2) Without aliasing

foo:                                    # @foo     .cfi_startproc # BB#0:     testl   %esi, %esi     jle .LBB0_3 # BB#1:                                 # %.lr.ph     addl    %edx, %edx     .align  16, 0x90 .LBB0_2:                                # =>This Inner Loop Header: Depth=1     movl    %edx, (%rdi)     addq    $4, %rdi     decl    %esi     jne .LBB0_2 .LBB0_3:                                # %._crit_edge     ret     .size   foo, .Ltmp1-foo     .cfi_endproc .Leh_func_end0:

You can see that the version with aliasing has to do more work in the loop body (between the labels LBB0_2 and LBB0_3).

155

answered Sep 28 '22 22:09

bames53

The type of problem Chandler was talking about can be easily illustrated with a simplified strcpy:

char *stpcpy (char * dest, const char * src);

When writing an implementation of this, you might assume that the memory pointed to by dest is completely separate from the memory pointed to by src. The compiler) might want to optimize it by reading a block of characters from the string pointed to by src, and writing all of them at once into dest. But if dest pointed to one byte ahead of src, the behaviour of this would differ from a simple character-by-character copy.

Here the aliasing problem is that src can alias dest, and the generated code must be made less efficient than it could be if src wasn't allowed to alias dest.

The real strcpy uses an extra keyword, Restrict (which is technically only part of C, not C++, that tells the compiler to assume that src and dest do not overlap, and this allows the compiler to generate much more efficient code.

Here's an even simpler example where we can see a big difference in the assembly:

void my_function_1(int* a, int* b, int* c) {     if (*a) *b = *a;     if (*a) *c = *a; }  void my_function_2(int* __restrict a, int* __restrict b, int* __restrict c) {     if (*a) *b = *a;     if (*a) *c = *a; }

Assume that this is a simplification of a function where it actually made sense to use two if-statements rather than just if (*a) { *b=*a; *c=*a; }, but the intent is the same.

We may assume when writing this that a != b because there's some reason why it would make no sense for my_function be used like that. But the compiler can't assume that, and does a store of b and a re-load of a from memory before executing the second line, to cover the case where b == a:

0000000000400550 <my_function_1>:   400550:       8b 07                   mov    (%rdi),%eax   400552:       85 c0                   test   %eax,%eax                 <= if (*a)   400554:       74 0a                   je     400560 <my_function_1+0x10>   400556:       89 06                   mov    %eax,(%rsi)   400558:       8b 07                   mov    (%rdi),%eax   40055a:       85 c0                   test   %eax,%eax                 <= if (*a)   40055c:       74 02                   je     400560 <my_function_1+0x10>   40055e:       89 02                   mov    %eax,(%rdx)   400560:       f3 c3                   repz retq

If we remove potential for aliasing by adding __restrict, the compiler generates shorter and faster code:

0000000000400570 <my_function_2>:   400570:       8b 07                   mov    (%rdi),%eax   400572:       85 c0                   test   %eax,%eax   400574:       74 04                   je     40057a <_Z9my_function_2PiS_S_+0xa>   400576:       89 06                   mov    %eax,(%rsi)   400578:       89 02                   mov    %eax,(%rdx)   40057a:       f3 c3                   repz retq

answered Sep 28 '22 21:09

je4d

Related questions
                            
                                Explicitly print CSRF token field instead of form_rest(form)?
                            
                                Add an OnClick handler with jQuery only once - even if called again
                            
                                Load page on selection from dropdown form
                            
                                C string concatenation of constants
                            
                                sRGB textures. Is this correct?
                            
                                HTML5 canvas alternatives for d3.js, graph visualization library
                            
                                how to clone an object in android?
                            
                                How to parse a date in format "YYYYmmdd" in JavaScript?
                            
                                grep in all directories
                            
                                Why does Haskell lack a literal Data.Map constructor syntax?
                            
                                C++ 2011 : std::thread : simple example to parallelize a loop?
                            
                                Combine two statements with LIMITS using UNION

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is aliasing and how does it affect performance?

Tags:

StackedCrooked

People also ask

2 Answers

bames53

je4d

Recent Activity

Donate For Us