I just tested a small example to check whether __restrict__
works in C++ on the latest compilers:
void foo(int x,int* __restrict__ ptr1, int& v2) {
for(int i=0;i<x;i++) {
if(*ptr1==v2) {
++ptr1;
} else {
*ptr1=*ptr1+1;
}
}
}
When trying it on godbolt.org with the latest gcc (gcc8.1 -O3 -std=c++14), the __restrict__
works as expected: v2
is loaded only once, since it cannot alias with ptr1
.
Here are the relevant assembly parts:
.L5:
mov eax, DWORD PTR [rsi]
cmp eax, ecx # <-- ecx contains v2, no load from memory
jne .L3
add edx, 1
add rsi, 4
cmp edi, edx
jne .L5
Now the same with the latest clang (clang 6.0.0 -O3 -std=c++14). It unrolls the loop once, so the generated code is much bigger, but here is the gist:
.LBB0_3: # =>This Inner Loop Header: Depth=1
mov edi, dword ptr [rsi]
cmp edi, dword ptr [rdx] # <-- restrict didn't work, v2 loaded from memory in hot loop
jne .LBB0_9
add rsi, 4
mov edi, dword ptr [rsi]
cmp edi, dword ptr [rdx] # <-- restrict didn't work, v2 loaded from memory in hot loop
je .LBB0_12
Why is this the case? I know that __restrict__
is non-standard and the compiler is free to ignore it, but it seems to be a very fundamental technique for getting the last bit of performance out of ones code, so I doubt that clang simply does not support it while supporting and ignoring the keyword itself. So, what is the issue here? Am I doing anything wrong?
(GNU C is a language, GCC is a compiler for that language.Clang defines __GNUC__ / __GNUC_MINOR__ / __GNUC_PATCHLEVEL__ according to the version of gcc that it claims full compatibility with.
Clang is much faster and uses far less memory than GCC. Clang aims to provide extremely clear and concise diagnostics (error and warning messages), and includes support for expressive diagnostics. GCC's warnings are sometimes acceptable, but are often confusing and it does not support expressive diagnostics.
Clang is designed to provide a frontend compiler that can replace GCC. Apple Inc. (including NeXT later) has been using GCC as the official compiler. GCC has always performed well as a standard compiler in the open source community.
2.4. To compile a C++ program on the command line, run the clang++ compiler as follows: $ scl enable llvm-toolset-6.0 'clang++ -o output_file source_file ...' This creates a binary file named output_file in the current working directory. If the -o option is omitted, the clang++ compiler creates a file named a.
So many useless comments...
This seems to be a bug in Clang alias analyzer. If you change type of v2
to short
compiler happily removes it from the loop based on type-based aliasing rules:
for.body: ; preds = %for.inc, %for.body.lr.ph
%i.09 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.inc ]
%ptr1.addr.08 = phi i32* [ %ptr1, %for.body.lr.ph ], [ %ptr1.addr.1, %for.inc ]
%1 = load i32, i32* %ptr1.addr.08, align 4, !tbaa !5
%cmp1 = icmp eq i32 %1, %conv
br i1 %cmp1, label %if.then, label %if.else
But with original loop you get the same alias set for both memory references, which is why middle-end can't optimize it:
%i.08 = phi i32 [ %inc, %for.inc ], [ 0, %for.body.preheader ]
%ptr1.addr.07 = phi i32* [ %ptr1.addr.1, %for.inc ], [ %ptr1, %for.body.preheader ]
%0 = load i32, i32* %ptr1.addr.07, align 4, !tbaa !1
%1 = load i32, i32* %v2, align 4, !tbaa !1
%cmp1 = icmp eq i32 %0, %1
br i1 %cmp1, label %if.then, label %if.else
Note the !tbaa !1
attached to both memory references which means that compiler couldn't distinguish memory accessed by either of them. It seems that restrict
annotation has been lost along the way...
I encourage you to reproduce this with latest Clang and file a bug in LLVM Bugzilla (be sure to cc Hal Finkel).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With