Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to count memory accesses to remote NUMA memory nodes?

In a multi-threaded application running on a recent linux Distributed Shared Memory system, is there a straight forward way to count the number of requests per thread to remote (non-local) NUMA memory nodes?

I am thinking of using PAPI to count interconnect traffic. Is this the way to go?

In my application, threads are bound to a particular core or processor for their entire life-time. When the application begins, memory is allocated page wise and spread in a round-robin manner across all available NUMA memory nodes.

Thank you for your answers.

like image 240
nandu Avatar asked Nov 18 '25 09:11

nandu


2 Answers

If you have access to VTune, local and remote NUMA node accesses are counted by hardware counters OFFCORE_RESPONSE.ANY_DATA.OTHER_LOCAL_DRAM_0 for fast local NUMA node accesses and OFFCORE_RESPONSE.ANY_DATA.REMOTE_DRAM_0 for slower remote NUMA node acccesses.

How the counters appear in VTune:

Configuring NUMA hardware counters in VTune

How the counters look in two scenarios:

NUMA unhappy code: core 0 (NUMA node 0) increments 50 MB residing on NUMA node 1: NUMA unhappy code with many remote NUMA node accesses

NUMA happy code: core 0 (NUMA node 0) increments 50 MB residing on NUMA node 0: NUMA happy code with many local NUMA node accesses

like image 102
Neil Justice Avatar answered Nov 20 '25 06:11

Neil Justice


I found the pcm-numa.x tool that comes with Intel PCM to be quite useful. It tells you the number of times each core has accessed the local or remote NUMA nodes.

like image 41
Fidel Avatar answered Nov 20 '25 07:11

Fidel