Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory alignment today and 20 years ago

Tags:

c

x86

gcc

assembly

In the famous paper "Smashing the Stack for Fun and Profit", its author takes a C function

void function(int a, int b, int c) {
  char buffer1[5];
  char buffer2[10];
}

and generates the corresponding assembly code output

pushl %ebp
movl %esp,%ebp
subl $20,%esp

The author explains that since computers address memory in multiples of word size, the compiler reserved 20 bytes on the stack (8 bytes for buffer1, 12 bytes for buffer2).

I tried to recreate this example and got the following

pushl   %ebp
movl    %esp, %ebp
subl    $16, %esp

A different result! I tried various combinations of sizes for buffer1 and buffer2, and it seems that modern gcc does not pad buffer sizes to multiples of word size anymore. Instead it abides the -mpreferred-stack-boundary option.

As an illustration -- using the paper's arithmetic rules, for buffer1[5] and buffer2[13] I'd get 8+16 = 24 bytes reserved on the stack. But in reality I got 32 bytes.

The paper is quite old and a lot of stuff happened since. I'd like to know, what exactly motivated this change of behavior? Is it the move towards 64bit machines? Or something else?

Edit

The code is compiled on a x86_64 machine using gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) like that:

$ gcc -S -o example1.s example1.c -fno-stack-protector -m32

like image 651
0x00 Avatar asked May 13 '15 18:05

0x00


People also ask

What is meant by memory alignment?

Alignment refers to the arrangement of data in memory, and specifically deals with the issue of accessing data as proper units of information from main memory. First we must conceptualize main memory as a contiguous block of consecutive memory locations. Each location contains a fixed number of bits.

Why is aligned memory faster?

So, as everyone thinks memory is cheap, they just made the compiler align the data on the processor's chunk sizes so your code runs faster and more efficiently at the cost of wasted memory.

What does it mean to be 32-bit aligned?

Alignment refers to a piece of data's location in memory. A variable is naturally aligned if it exists at a memory address that is a multiple of its size. For example, a 32-bit type is naturally aligned if it is located in memory at an address that is a multiple of four (that is, its lowest two bits are zero).


2 Answers

What has changed is SSE, which requires 16 byte alignment, this is covered in this older gcc document for -mpreferred-stack-boundary=num which says (emphasis mine):

On Pentium and PentiumPro, double and long double values should be aligned to an 8 byte boundary (see -malign-double) or suffer significant run time performance penalties. On Pentium III, the Streaming SIMD Extension (SSE) data type __m128 suffers similar penalties if it is not 16 byte aligned.

This is also backed up by the paper Smashing The Modern Stack For Fun And Profit which covers this an other modern changes that break Smashing the Stack for Fun and Profit.

like image 78
Shafik Yaghmour Avatar answered Sep 19 '22 18:09

Shafik Yaghmour


Memory alignment of which stack alignment is just one aspect depends on the architecture. It is partly defined in the Applicaion Binary Interface of the language and a Procedure Call Standard (sometimes it is both in a single spec) for the architecture (CPU, it might even vary depending on platform) and also depends on the compiler/toolchain where the former documents leave room for variations.

The former two documents (names may vary) are mostly for the external interface between functions; they might leave internal structure to the toolchain. Howwever, that has to match the architecture. Normally the hardware requires a minimal alignment, but allows for a larger alignment for performance reasons (e.g.: byte-alignment minimum, but this would require multiple bus-cycles to read a 32 bit word, so the compiler uses a 32 bit alignment).

Normally, the compiler (following the PCS) uses an alignment optimal for the architecture and under control of optimization settings (optimize for speed or size). It takes into account not only the size of the object (aligned to its natural boundary), but also sizes of internal busses (e.g. a 32 bit x86 has internal 64 or 128 bit busses, ARM CPUs have internal 32 to 128 (possibly even wider) bit busses), caches, etc. For local variables, it may also take into account access-patterns, so two adjacent variables may be loaded in parallel into a register pair instead of two separate loads or even reorder such variables.

The stackpointer might require a higher alignment for instance, so the CPU can push in an interrupt frame two registers at once, push vector registers which require higher alignment, etc. You can write quite a thick book about this subject (and I bet, someone already has).

So, in general, there is no single one-alignment-fits all rule. However, for struct and array packing, the C standard does define some rules for packing/alignment, mostly to guarantee consistence of e.g. sizeof(type) and the address in an array (required for correct malloc()).

Even char arrays might be aligned for optimal cache layout. Note it is not only the CPU which might have caches, but also PCIe bridges, not to mention PCIe transfers themselves down to DRAM pages.

like image 24
too honest for this site Avatar answered Sep 17 '22 18:09

too honest for this site