Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dummy movups generated by gcc

A little curiosity I found; GCC seems to generate the following code when I have a lot of optimization flags on:

00000000004019ae:   test %si,%si
00000000004019b1:   movups %xmm0,%xmm0
00000000004019b4:   je 0x401f40 <main(int, char**)+1904>

Question: what purpose does the second instruction serve? It doesn't look like it /does/ anything; so, is it some optimization to align the program in the instruction cache? Or is it something with out-of-order execution? (I'm compiling with -mtune=native on a Nehalem if that helps :D).

Nothing urgent, of course, just curious.

like image 292
vpostman Avatar asked Feb 21 '23 18:02

vpostman


2 Answers

Possibly xmm0 contains a result of some calculations, done in integer domain (with integer SSE instruction). And the next instruction using xmm0 is expected to be in floating point domain (floating point SSE instruction).

Nehalem may perform this next instruction faster if xmm0 is migrated to floating point domain with instruction like movaps or movups. And it may be beneficial to perform this migration prior to conditional jump instruction. In this case migration is done only once. If no movups instruction used, migration may be done twice (automatically, by the first FP instruction on this register), first time speculatively, on mispredicted branch, and second time - on the correct branch.

It seems, compiler noticed, that it is better to optimize calculation dependency chains, than to optimize for code size and execution resources.

like image 161
Evgeny Kluev Avatar answered Feb 26 '23 18:02

Evgeny Kluev


Adding to the hypothesis proposed by Evgeny Kluev, other possibilities (in no particular order) are that (a) it's a compiler optimiser bug, (b) movups is inserted to break a dependency or (c) it is inserted for the purpose of code alignment.

like image 28
PhiS Avatar answered Feb 26 '23 19:02

PhiS