I was reading a article about data types alignment in memory(here) and I am unable to understand one point i.e.
Note that a double variable will be allocated on 8 byte boundary on 32 bit machine and requires two memory read cycles. On a 64 bit machine, based on number of banks, double variable will be allocated on 8 byte boundary and requires only one memory read cycle.
My doubt is: Why double variables need to be allocated on 8 byte boundary and not on 4 byte? If it is allocated on 4 byte boundary still we need only 2 memory read cycles(on a 32 bit machine). Correct me if I am wrong.
Also if some one has a good tutorial on member/memory alignment, kindly share.
"Natural" alignment (8-byte) is preferred for performance. Always 16-byte. Atomic load and store operations when 8-byte aligned. Does not apply to aggregate copy operations. Atomic load and store operations. Atomic copy when part of an aggregate. A 64-bit value which represents an offset into teraspace. It does not contain an effective address.
Two byte numbers should be aligned to a two byte boundary Four byte numbers should be aligned to a four byte boundary Structures between 1 and 4 bytes of data should be padded so that the total structure is 4 bytes. Structures between 5 and 8 bytes of data should be padded so that the total structure is 8 bytes.
No, a C / C++ pointer is not always four bytes. In the normal case, the size of the pointer is determined by the architecture of the platform your compiler is running on. For example, a pointer on a 64-bit system will be 64-bits, which is 8 bytes.
@OliverCharlesworth: SSE has no 8-byte-alignment-required loads/stores. It's either 16-byte alignment required for 16-byte loads/stores, or no alignment required for any narrower operands. But yes it's good for performance to make doubles 8-byte aligned so they can't split across cache lines.
The reason to align a data value of size 2^N on a boundary of 2^N is to avoid the possibility that the value will be split across a cache line boundary.
The x86-32 processor can fetch a double from any word boundary (8 byte aligned or not) in at most two, 32-bit memory reads. But if the value is split across a cache line boundary, then the time to fetch the 2nd word may be quite long because of the need to fetch a 2nd cache line from memory. This produces poor processor performance unnecessarily. (As a practical matter, the current processors don't fetch 32-bits from the memory at a time; they tend to fetch much bigger values on much wider busses to enable really high data bandwidths; the actual time to fetch both words if they are in the same cache line, and already cached, may be just 1 clock).
A free consequence of this alignment scheme is that such values also do not cross page boundaries. This avoids the possibility of a page fault in the middle of an data fetch.
So, you should align doubles on 8 byte boundaries for performance reasons. And the compilers know this and just do it for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With