Has anyone ever seen any numbers/analysis on whether or not use of the C/C++ restrict
keyword in gcc/g++ actual provides any significant performance boost in reality (and not just in theory)?
I've read various articles recommending / disparaging its use, but I haven't ran across any real numbers practically demonstrating either sides arguments.
EDIT
I know that restrict
is not officially part of C++, but it is supported by some compilers and I've read a paper by Christer Ericson which strongly recommends its usage.
restrict keyword is mainly used in pointer declarations as a type qualifier for pointers. It doesn't add any new functionality. It is only a way for programmer to inform about an optimization that compiler can make.
gcc. g++ is used to compile C++ program. gcc is used to compile C program.
For c++ you should use g++. It's the same compiler (e.g. the GNU compiler collection). GCC or G++ just choose a different front-end with different default options.
The restrict keyword does a difference.
I've seen improvements of factor 2 and more in some situations (image processing). Most of the time the difference is not that large though. About 10%.
Here is a little example that illustrate the difference. I've written a very basic 4x4 vector * matrix transform as a test. Note that I have to force the function not to be inlined. Otherwise GCC detects that there aren't any aliasing pointers in my benchmark code and restrict wouldn't make a difference due to inlining.
I could have moved the transform function to a different file as well.
#include <math.h> #ifdef USE_RESTRICT #else #define __restrict #endif void transform (float * __restrict dest, float * __restrict src, float * __restrict matrix, int n) __attribute__ ((noinline)); void transform (float * __restrict dest, float * __restrict src, float * __restrict matrix, int n) { int i; // simple transform loop. // written with aliasing in mind. dest, src and matrix // are potentially aliasing, so the compiler is forced to reload // the values of matrix and src for each iteration. for (i=0; i<n; i++) { dest[0] = src[0] * matrix[0] + src[1] * matrix[1] + src[2] * matrix[2] + src[3] * matrix[3]; dest[1] = src[0] * matrix[4] + src[1] * matrix[5] + src[2] * matrix[6] + src[3] * matrix[7]; dest[2] = src[0] * matrix[8] + src[1] * matrix[9] + src[2] * matrix[10] + src[3] * matrix[11]; dest[3] = src[0] * matrix[12] + src[1] * matrix[13] + src[2] * matrix[14] + src[3] * matrix[15]; src += 4; dest += 4; } } float srcdata[4*10000]; float dstdata[4*10000]; int main (int argc, char**args) { int i,j; float matrix[16]; // init all source-data, so we don't get NANs for (i=0; i<16; i++) matrix[i] = 1; for (i=0; i<4*10000; i++) srcdata[i] = i; // do a bunch of tests for benchmarking. for (j=0; j<10000; j++) transform (dstdata, srcdata, matrix, 10000); }
Results: (on my 2 Ghz Core Duo)
nils@doofnase:~$ gcc -O3 test.c nils@doofnase:~$ time ./a.out real 0m2.517s user 0m2.516s sys 0m0.004s nils@doofnase:~$ gcc -O3 -DUSE_RESTRICT test.c nils@doofnase:~$ time ./a.out real 0m2.034s user 0m2.028s sys 0m0.000s
Over the thumb 20% faster execution, on that system.
To show how much it depends on the architecture I've let the same code run on a Cortex-A8 embedded CPU (adjusted the loop count a bit cause I don't want to wait that long):
root@beagleboard:~# gcc -O3 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp test.c root@beagleboard:~# time ./a.out real 0m 7.64s user 0m 7.62s sys 0m 0.00s root@beagleboard:~# gcc -O3 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -DUSE_RESTRICT test.c root@beagleboard:~# time ./a.out real 0m 7.00s user 0m 6.98s sys 0m 0.00s
Here the difference is just 9% (same compiler btw.)
Does the restrict keyword provide significant benefits in gcc / g++ ?
It can reduce the number of instructions as shown on the example below, so use it whenever possible.
GCC 4.8 Linux x86-64 exmample
Input:
void f(int *a, int *b, int *x) { *a += *x; *b += *x; } void fr(int *restrict a, int *restrict b, int *restrict x) { *a += *x; *b += *x; }
Compile and decompile:
gcc -g -std=c99 -O0 -c main.c objdump -S main.o
With -O0
, they are the same.
With -O3
:
void f(int *a, int *b, int *x) { *a += *x; 0: 8b 02 mov (%rdx),%eax 2: 01 07 add %eax,(%rdi) *b += *x; 4: 8b 02 mov (%rdx),%eax 6: 01 06 add %eax,(%rsi) void fr(int *restrict a, int *restrict b, int *restrict x) { *a += *x; 10: 8b 02 mov (%rdx),%eax 12: 01 07 add %eax,(%rdi) *b += *x; 14: 01 06 add %eax,(%rsi)
For the uninitiated, the calling convention is:
rdi
= first parameterrsi
= second parameterrdx
= third parameterConclusion: 3 instructions instead of 4.
Of course, instructions can have different latencies, but this gives a good idea.
Why GCC was able to optimize that?
The code above was taken from the Wikipedia example which is very illuminating.
Pseudo assembly for f
:
load R1 ← *x ; Load the value of x pointer load R2 ← *a ; Load the value of a pointer add R2 += R1 ; Perform Addition set R2 → *a ; Update the value of a pointer ; Similarly for b, note that x is loaded twice, ; because x may point to a (a aliased by x) thus ; the value of x will change when the value of a ; changes. load R1 ← *x load R2 ← *b add R2 += R1 set R2 → *b
For fr
:
load R1 ← *x load R2 ← *a add R2 += R1 set R2 → *a ; Note that x is not reloaded, ; because the compiler knows it is unchanged ; "load R1 ← *x" is no longer needed. load R2 ← *b add R2 += R1 set R2 → *b
Is it really any faster?
Ermmm... not for this simple test:
.text .global _start _start: mov $0x10000000, %rbx mov $x, %rdx mov $x, %rdi mov $x, %rsi loop: # START of interesting block mov (%rdx),%eax add %eax,(%rdi) mov (%rdx),%eax # Comment out this line. add %eax,(%rsi) # END ------------------------ dec %rbx cmp $0, %rbx jnz loop mov $60, %rax mov $0, %rdi syscall .data x: .int 0
And then:
as -o a.o a.S && ld a.o && time ./a.out
on Ubuntu 14.04 AMD64 CPU Intel i5-3210M.
I confess that I still don't understand modern CPUs. Let me know if you:
The article Demystifying The Restrict Keyword refers to the paper Why Programmer-specified Aliasing is a Bad Idea (pdf) which says it generally doesn't help and provides measurements to back this up.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With