Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get memory granularity of a processor

How to get the memory granularity of a CPU in C?

Suppose I want to allocate an array where all the elements are properly memory aligned. I can pad each element to a certain size N to achieve this. How do I know the value of N?

Note: I am trying to create a memory pool where each slot is memory aligned. Any suggestion will be appreciated.

like image 207
Prabhakar Tayenjam Avatar asked Oct 16 '22 03:10

Prabhakar Tayenjam


2 Answers

In Theory

How to get the memory granularity of a CPU in C?

First, you read the instruction set architecture manual. It may specify that certain instructions require certain alignments, or even that the addressing forms in certain instructions cannot represent non-aligned addresses. It may specify other properties regarding alignment.

Second, you read the processor manual. It may specify performance characteristics (such as that unaligned loads or stores are supported but may be slower or use more resources than aligned loads or stores) and may specify various options allowed by the instructions set architecture.

Third, you read the operating system documentation. Some architectures allow the operating system to select features related to alignment, such as whether unaligned loads and stores are made to fail or are supported albeit with slower performance than aligned loads or stores. The operating system documentation should have this information.

In Practice

For many programming situations, what you need to know is not the “memory granularity” of a CPU but the alignment requirements of the C implementation you are using (or of whatever language you are using). And, for the most part, you do not need to know the alignment requirements directly but just need to follow the language rules about managing objects—use objects with declared types, do not use casts to convert pointers between incompatible types exceed where specific rules allow it, use the suitably aligned memory as provided by malloc rather than adjusting your own pointers to bytes, and so on. Following these rules will give good alignment for the objects in your program.

In C, when you define an array, the element size will automatically be the size that C implementation needs for its alignment. For example, long double x[100]; may use 16 bytes for each array element even though the hardware uses only ten bytes for a long double. Or, for any struct foo that you define, the compiler will automatically include padding as needed in the structure to give the desired alignment, and any array struct foo x[100]; will already include that padding. sizeof(struct foo) will be the same as sizeof x[0], because each structure object has that padding built in, even just for a single structure object, not just for elements in arrays.

When you do need to know the alignment that a C implementation requires for a type, you can use C’s _Alignof operator. The expression _Alignof(type) provides the alignment required for type.

Other

… properly memory aligned.

Proper alignment is a matter of degrees:

  • What the processor supports may determine whether your program works or does not work. An improper alignment is one that causes your program to trap.
  • What is efficient with respect to individual loads and stores may affect how fast your program runs. An improper alignment is one that causes your program to execute more slowly.
  • In certain performance-critical situations, alignment with respect to cache and memory mapping features can also affect performance.
like image 72
Eric Postpischil Avatar answered Nov 26 '22 10:11

Eric Postpischil


Short answer

Use 64 bytes.

Long answer

Data are loaded from and stored to memory in units called cache lines. If your program loads only part of the data in a cache line, then the whole line will be loaded into the CPU caches. Perhaps more importantly, the algorithm used for moving data between cores in a multi-core CPU operates on full cache lines; aligning your data to cache lines avoids false sharing, the situation where a cache line bounces between cores because it contains data manipulated by different threads.

It used to be the case that cache lines depended on the architecture, ranging from 16 up to 512 bytes. However, all current processors (Intel, AMD, ARM, MIPS) use a cache line of 64 bytes.

like image 32
jch Avatar answered Nov 26 '22 10:11

jch