Here is some fairly straightforward code, compiled with -O2 (gcc 4.8.5) :
unsigned char * linebuf;
int yuyv_tojpegycbcr(unsigned char * buf, int w)
{
int col;
unsigned char * restrict pix = buf;
unsigned char * restrict line = linebuf;
for(col = 0; col < w - 1; col +=2)
{
line[col*3] = pix[0];
line[col*3 + 1] = pix[1];
line[col*3 + 2] = pix[3];
line[col*3 + 3] = pix[2];
line[col*3 + 4] = pix[1];
line[col*3 + 5] = pix[3];
pix += 4;
}
return 0;
}
and here is the corresponding assembly :
0000000000000000 <yuyv_tojpegycbcr>:
0: 83 fe 01 cmp $0x1,%esi
3: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # a <yuyv_tojpegycbcr+0xa>
a: 7e 4e jle 5a <yuyv_tojpegycbcr+0x5a>
c: 83 ee 02 sub $0x2,%esi
f: 31 d2 xor %edx,%edx
11: d1 ee shr %esi
13: 48 8d 74 76 03 lea 0x3(%rsi,%rsi,2),%rsi
18: 48 01 f6 add %rsi,%rsi
1b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
20: 0f b6 0f movzbl (%rdi),%ecx
23: 48 83 c2 06 add $0x6,%rdx
27: 48 83 c7 04 add $0x4,%rdi
2b: 48 83 c0 06 add $0x6,%rax
2f: 88 48 fa mov %cl,-0x6(%rax)
32: 0f b6 4f fd movzbl -0x3(%rdi),%ecx
36: 88 48 fb mov %cl,-0x5(%rax)
39: 0f b6 4f ff movzbl -0x1(%rdi),%ecx
3d: 88 48 fc mov %cl,-0x4(%rax)
40: 0f b6 4f fe movzbl -0x2(%rdi),%ecx
44: 88 48 fd mov %cl,-0x3(%rax)
47: 0f b6 4f fd movzbl -0x3(%rdi),%ecx
4b: 88 48 fe mov %cl,-0x2(%rax)
4e: 0f b6 4f ff movzbl -0x1(%rdi),%ecx
52: 88 48 ff mov %cl,-0x1(%rax)
55: 48 39 f2 cmp %rsi,%rdx
58: 75 c6 jne 20 <yuyv_tojpegycbcr+0x20>
5a: 31 c0 xor %eax,%eax
5c: c3 retq
When compiled without the restrict qualifier, the output is identical :
A lots of intermixed loads and store. Some value are loaded twice, and it looks like no optimisation happened. If pix
and line
are unaliased, I expect the compiler to be smart enough, and among other things load pix[1] and pix[3] only once.
Do you know of anything that could disqualify the restrict
qualifier ?
PS : With a newer gcc (4.9.2), on another architecture (arm v7), the result is similar. Here is a test script to compare the generated code with and without restrict.
#!/bin/sh
gcc -c -o test.o -std=c99 -O2 yuyv_to_jpegycbcr.c
objdump -d test.o > test.S
gcc -c -o test2.o -O2 -D restrict='' yuyv_to_jpegycbcr.c
objdump -d test2.o > test2.S
Put the restrict on the function parameters rather than the local variables.
From my experience, most compilers (including GCC) utilize the restrict only if it is specified on the function parameters. All uses on local variables within a function are ignored.
I suspect this has to do with aliasing analysis being done at the function-level rather than the basic-block level. But I have no evidence to back this up. Furthermore, it probably varies by compiler and compiler version.
Either way, these sorts of things are pretty finicky to rely on. So if the performance matters, either you optimize it manually, or you remember to revisit it every time you upgrade or change compilers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With