Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

do malloc/memcpy function run independently on NUMA?

While trying to increase the speed of my applications on non-NUMA / standard PCs I always found that the bottleneck was the call to malloc() because even in multi-core machines it is shared/synch between all the cores.

I have available a PC with NUMA architecture using Linux and C and I have two questions:

  1. In a NUMA machine, since each core is provided with its own memory, will malloc() execute independently on each core/memory without blocking the other cores?
  2. In these architectures how are the calls to memcpy() made? Can this be called independently on each core or, calling it in once core will block the others? I maybe wrong but I remember that also memcpy() got the same problem of malloc() i.e. when one core is using it the others have to wait.
like image 830
Abruzzo Forte e Gentile Avatar asked Mar 29 '11 10:03

Abruzzo Forte e Gentile


2 Answers

A NUMA machine is a shared memory system, so memory accesses from any processor can reach the memory without blocking. If the memory model were message based, then accessing remote memory would require the executing processor to request that the local processor perform the desired operation. However, in a NUMA system, a remote processor may still impact the performance of the close processor due to utilizing the memory links, though this can depend on the specific architectural configuration.

As for 1, this entirely depends on the OS and malloc library. The OS is responsible for presenting the per-core / per-processor memory as either a unified space or as NUMA. Malloc may or may not be NUMA-aware. But fundamentally, the malloc implementation may or may not be able to execute concurrently with other requests. And the answer from Al (and associated discussion) addresses this point in greater detail.

As for 2, as memcpy consist of a series of loads and stores, the only impact would again be the potential architectural effects of using the other processors' memory controllers, etc.

like image 51
Brian Avatar answered Sep 17 '22 14:09

Brian


  1. Calls to malloc in separate processes will execute independently regardless of whether you are on a NUMA architecture. Calls to malloc in different threads of the same process cannot execute independently because the memory returned is equally accessible to all threads within the process. If you want memory that is local to a particular thread, read up on Thread Local Storage. I have not been able to find any clear documentation on whether the Linux VM and scheduler are able optimize the affinity between cores, threads, local memory and thread local storage.
like image 43
Al Riddoch Avatar answered Sep 18 '22 14:09

Al Riddoch