I'm spending some time on assembly programming (Gas, in particular) and recently I learned about the align directive. I think I've understood the very basics, but I would like to gain a deeper understanding of its nature and when to use alignment.
For instance, I wondered about the assembly code of a simple C++ switch statement. I know that under certain circumstances switch statements are based on jump tables, as in the following few lines of code:
.section .rodata
.align 4
.align 4
.L8:
.long .L2
.long .L3
.long .L4
.long .L5
...
.align 4 aligns the following data on the next 4-byte boundary which ensures that fetching these memory locations is efficient, right? I think this is done because there might be things happening before the switch statement which caused misalignment. But why are there actually two calls to .align? Are there any rules of thumb when to call .align or should it simply be done whenever a new block of data is stored in memory and something prior to this could have caused misalignment?
In case of arrays, it seems that alignment is done on 32-byte boundaries as soon as the array occupies at least 32 byte. Is it more efficient to do it this way or is there another reason for the 32-byte boundary?
I'd appreciate any explanation or hint on literature.
The ALIGN directive allows you to specify the beginning offset of a data element or an instruction. Aligned data can improve performance, at the expense of wasted space between data elements.
The ALIGN directive aligns the current location to a specified boundary by padding with zeros or NOP instructions.
align 8'' will ensure that the next address is a multiple of 8. To ensure that the next address meets the alignment requirements, the assembler emits ``pad'' bytes. If the second argument is supplied, the assembler uses this value when it emits pad bytes; otherwise, the assembler emits zeros.
The . sect directive defines an initialized named section and associates subsequent code or data with that section.
There are more than one .align
directives just because of the way the compiler works internally; one would have been sufficient and emitting only one takes extra work.
As far as alignment in general, it's a complex topic but here's an article for Intel x64 that discusses some of the issues you are interested in:
Other architecture can be vastly different.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With