I have a server with 2 CPUs, each with 6 cores. Each of the CPUs are connected to 4 GB of RAM. I have a parallel program that runs the same code (with minor changes) in both CPUs in parallel, using 4 threads in each core.
For efficiency reasons, it would be best if there was a way to ensure that the code running on CPU 1 would only allocate memory on its corresponding ram and not on the ram of CPU 2, and vice versa, as the communication between CPUs would create an overhead.
Is there any way to do this?
Assuming you are using Linux, the default NUMA policy prefers to allocate memory locally, so what you are asking for should work out-of-the-box. This can be changed through configuration though.
Whatever the current policy you can use libnuma
to allocate memory on the local NUMA node (that's what the call the combination of RAM + socket / core) or on a specific node, with numa_alloc_local
, numa_alloc_onnode
, and so on. To free the memory use numa_free
. See the man pages of numa(7) and numa_alloc(3) for details on these functions and the NUMA system in general.
You could take a look at Hoard memory allocator. I believe it tries to solve the same problem you are hitting.
Hoard is a drop-in replacement for malloc that can dramatically improve application performance, especially for multithreaded programs running on multiprocessors and multicore CPUs
Particularly, the problem of 'False sharing` seems to be what you want to avoid.
False Sharing
The allocator can cause other problems for multithreaded code. It can lead to false sharing in your application: threads on different CPUs can end up with memory in the same cache line, or chunk of memory. Accessing these falsely-shared cache lines is hundreds of times slower than accessing unshared cache lines.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With