Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the L3 cache index and NUMA node index for the current hardware thread

Tags:

c++

c

x86

numa

cpuid

I'm building a topological tree of sockets, NUMA nodes, caches, cores, and threads for any Intel or AMD system in C.

Building this hierarchy, I want to ensure hardware threads are grouped together appropriately so it's clear who precisely shares what. I've found that I can set a thread's affinity and then use the cpuid instruction to get a lot of the info I want, but not all.

If a package/socket has multiple NUMA nodes, how do I get an index of the NUMA node for the current hardware thread? If the NUMA node has multiple L3 caches, how do I get the index?

AMD has something for NUMA node ID in Fn8000_001E_ECX, but I can't find anything comparable for Intel. And nothing re: L3 index for either.

like image 762
Will Leiserson Avatar asked Sep 01 '21 21:09

Will Leiserson


People also ask

How do you find the NUMA node?

Once task manager is open go to the Performance tab (if it isn't visible select the “More Details” button at the bottom) and select the CPU graph. You should see a graph on the right. Right click on that graph and select “Change graph to”, then you should see a NUMA node option.

How many NUMA nodes does a CPU have?

Also here the number of NUMA nodes is equal to number of CPU sockets (8). This depends on the CPU architecture, mainly its memory bus design. The whole NUMA (non-uniform memory access) defines how can each logical CPU access each part of memory.

How do I see NUMA nodes in Linux?

NUMA Enabled Systems If NUMA is enabled on BIOS, then execute the command 'numactl –hardware' to list inventory of available nodes on the system.

How much local memory does a NUMA node have?

For example 16GB will be assigned to each NUMA node on a two socket server with 32GB total physical. A quick way to confirm the local memory configuration of the NUMA nodes is firing up esxtop. Esxtop will only display NUMA statistics if ESX is running on a NUMA server.

Why is cache coherency so important in Numa?

When people talk about NUMA, most talk about the RAM and the core count of the physical CPU. Unfortunately, the importance of cache coherency in this architecture is mostly ignored. Locating memory close to CPUs increases scalability and reduces latency if data locality occurs.

How does the L2 cache interact with the L3 cache?

The L2 caches interact with the rest of the memory subsystem via a set of interconnected buses. These buses connect to the many L2 caches and also to the partitioned L3 cache (if L3 exists), the memory controllers managing the memory devices, and the I/O controllers managing the I/O devices.

What is the index of a direct mapped cache?

The index for a direct mapped cache is the number of blocks in the cache (12 bits in this case, because 2 12 =4096.) Then the tag is all the bits that are left, as you have indicated.


1 Answers

If a package/socket has multiple NUMA nodes, how do I get an index of the NUMA node for the current hardware thread?

You get this information from ACPI.

Specifically, there's a "System Resource Affinity Table" (SRAT) that contains a list of structures describing which NUMA domain different things (CPUs, memory areas, ...) are in at boot time. For 80x86; you'd parse this list looking for both "Processor Local APIC/SAPIC Affinity Structures" and "Processor Local x2APIC Affinity Structures".

For hot-plug CPUs the table isn't enough (the SRAT won't change when a CPU is inserted or removed after boot), so you might also need to use an ACPI machine language interpreter to execute _PXM objects to obtain current NUMA information. Computers that support hot-plug CPUs is very rare though.

Note that in ACPI "NUMA domain numbers" are excessively large (32 bits) and not guaranteed to be contiguous (e.g. in theory you could have 2 NUMA nodes with the NUMA domain numbers 0x12345678 and 0x9ABCDEF0); which means that you can't use them for array indexes (e.g. if you want to do something like "NUMA_stats[domain].CPU_count++;" it won't be fun). There is also no standard value reserved for "unknown NUMA domain", which is inconvenient for code that determines topology (e.g. you'd need a separate "did/didn't find a valid NUMA domain" flag to keep track).

like image 164
Brendan Avatar answered Oct 17 '22 07:10

Brendan