Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What value of alignment should I with mkl_malloc?

The function mkl_malloc is similar to malloc but has an extra alignment argument. Here's the prototype:

void* mkl_malloc (size_t alloc_size, int alignment);

I've noticed different performances with different values of alignment. Apart from trial and error, is there a canonical or documented methodical way to decide on the best value of alignment? i.e. processor being used, function being called, operation being performed etc.

This question widely applicable to anyone who uses MKL so I'm very surprised it is not in the reference manual.

update: I have tried with mkl_sparse_spmm and have not noticed a significant difference in performance for setting the alignment to powers of 2 up to 1024 bytes, after that the performance tends to drop. I'm using an Intel Xeon E5-2683.

like image 717
avgn Avatar asked Jan 28 '23 16:01

avgn


2 Answers

Alignment only affects performance when SSE/AVX instructions can be used - this is commonly true when operating with arrays as you wish to apply the same operation to a range of elements.

In general, you want to choose alignment based on the CPU, if it supports AVX2 which has 256bit registers, then you want 32 byte alignment, if it supports AVX512, then 64 bytes would be optimal.

To that end, mkl_malloc will guarantee alignment to the value you specify, however, obviously if the data are 32-byte aligned, then they are also aligned to a (16, 8, 4...)-byte boundary. The purpose of the call is to ensure this is always the case and thus avoid any potential complications.

On my machine (Linux kernel 4.17.11 running on i7 6700K), the default alignment of mkl_malloc seems to be 128-bytes (for large enough arrays, if they are too small the value seems to be 32KB), in other words, any value smaller than that has no effect on alignment, I can however input 256 and the data will be aligned to the 256-byte boundary.

In contrast, using malloc gives me 16byte alignment for 1GB of data and 32-byte alignment for 1KB, whatever the OS gives me with absolutely no preference regarding alignment.

So using mkl_malloc makes sense as it ensures you get the alignment you desire. However, that doesn't mean you should set the value to be too large, that will simply cause you to waste memory and potentially expose you to an increased number of cache misses.

In short, you want your data to be aligned to the size of the vector registers in your CPU so that you can make use of the relevant extensions. Using mkl_malloc with some parameter for alignment guarantees alignment to at least that value, it can however be more. It should be used to make sure the data are aligned the way you want, but there is absolutely no good reason to align to 1MB.

like image 169
Qubit Avatar answered Feb 07 '23 17:02

Qubit


The only reason, why regardless of your input, you have no penalties / gains from specifying the alignment is that you get machine aligned memory no matter what you type in. So on your processor, which supports AVX, you are always getting 32 byte aligned memory regardless of your input.

You will also see, that whatever alignment value you go for, the memory address, which mkl_malloc, returns is divisible 32-aligned. Alternatively you may test that low level intrisics like _mm256_load_pd, which would seg fault, when a not 32 byte aligned address is used never seg fault.

Some minor details: OSX always gives you 32 byte address, independant of heap / stack when you allocate a chunk of memory, while Linux will always give you aligned memory, when allocating on heap. Stack is a matter of luck on Linux, but you exceed with small matrix size already the limit for stack allocations. I have no understanding of memory allocation on Windows.

I noticed the latter, when I was writing tests for my numerics library where I use std::vector<typename T, alignment A> for memory allocation and smaller matrix tests sometimes seg faulted on Linux.

TLDR: your alignment input is effectively discarded and you are getting machine alignment regardless.

like image 29
Kaveh Vahedipour Avatar answered Feb 07 '23 17:02

Kaveh Vahedipour