In our embedded architecture we have a 64-bit IAB (Instruction Alignment Buffer). In order to optimize the fetch sequence, it is required that the body of a loop will start aligned to an 8-byte boundary.
It is easy to achieve this in assembly using the .balign
directive, but I cannot find a syntax that will hint the C compiler to align the code.
Trying to precede the for loop with inline assembly with the .balign
directive doesn't work as it aligns the for loop prolog (setup) and not the loop body itself.
Doing the same where the asm()
line is inside the loop, adds nop
-s to the loop body that cost precious cycles.
EDIT 1: assume the code:
__asm__ volatile("nop");
__asm__ volatile("nop");
for (j0=0; j0<N; j0+=4)
{
c[j0+ 0] = a[j0+ 0] + b[j0+ 0];
c[j0+ 1] = a[j0+ 1] + b[j0+ 1];
c[j0+ 2] = a[j0+ 2] + b[j0+ 2];
c[j0+ 3] = a[j0+ 3] + b[j0+ 3];
}
I want the first c=a+b
to be aligned to an 8-byte address. I can add the nop
-s like above after a preliminary compilation, but this is an ad-hoc solution that will break with the 1st code change.
EDIT 2: Thanks to @R.., the solution is to use the -falign-loops=8
compiler option.
Umm, isn't this what GCC's -falign-loops
option is for?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With