I'm building a topological tree of sockets, NUMA nodes, caches, cores, and threads for any Intel or AMD system in C.
Building this hierarchy, I want to ensure hardware threads are grouped together appropriately so it's clear who precisely shares what. I've found that I can set a thread's affinity and then use the cpuid
instruction to get a lot of the info I want, but not all.
If a package/socket has multiple NUMA nodes, how do I get an index of the NUMA node for the current hardware thread? If the NUMA node has multiple L3 caches, how do I get the index?
AMD has something for NUMA node ID in Fn8000_001E_ECX
, but I can't find anything comparable for Intel. And nothing re: L3 index for either.
Once task manager is open go to the Performance tab (if it isn't visible select the “More Details” button at the bottom) and select the CPU graph. You should see a graph on the right. Right click on that graph and select “Change graph to”, then you should see a NUMA node option.
Also here the number of NUMA nodes is equal to number of CPU sockets (8). This depends on the CPU architecture, mainly its memory bus design. The whole NUMA (non-uniform memory access) defines how can each logical CPU access each part of memory.
NUMA Enabled Systems If NUMA is enabled on BIOS, then execute the command 'numactl –hardware' to list inventory of available nodes on the system.
For example 16GB will be assigned to each NUMA node on a two socket server with 32GB total physical. A quick way to confirm the local memory configuration of the NUMA nodes is firing up esxtop. Esxtop will only display NUMA statistics if ESX is running on a NUMA server.
When people talk about NUMA, most talk about the RAM and the core count of the physical CPU. Unfortunately, the importance of cache coherency in this architecture is mostly ignored. Locating memory close to CPUs increases scalability and reduces latency if data locality occurs.
The L2 caches interact with the rest of the memory subsystem via a set of interconnected buses. These buses connect to the many L2 caches and also to the partitioned L3 cache (if L3 exists), the memory controllers managing the memory devices, and the I/O controllers managing the I/O devices.
The index for a direct mapped cache is the number of blocks in the cache (12 bits in this case, because 2 12 =4096.) Then the tag is all the bits that are left, as you have indicated.
If a package/socket has multiple NUMA nodes, how do I get an index of the NUMA node for the current hardware thread?
You get this information from ACPI.
Specifically, there's a "System Resource Affinity Table" (SRAT) that contains a list of structures describing which NUMA domain different things (CPUs, memory areas, ...) are in at boot time. For 80x86; you'd parse this list looking for both "Processor Local APIC/SAPIC Affinity Structures" and "Processor Local x2APIC Affinity Structures".
For hot-plug CPUs the table isn't enough (the SRAT won't change when a CPU is inserted or removed after boot), so you might also need to use an ACPI machine language interpreter to execute _PXM
objects to obtain current NUMA information. Computers that support hot-plug CPUs is very rare though.
Note that in ACPI "NUMA domain numbers" are excessively large (32 bits) and not guaranteed to be contiguous (e.g. in theory you could have 2 NUMA nodes with the NUMA domain numbers 0x12345678 and 0x9ABCDEF0); which means that you can't use them for array indexes (e.g. if you want to do something like "NUMA_stats[domain].CPU_count++;
" it won't be fun). There is also no standard value reserved for "unknown NUMA domain", which is inconvenient for code that determines topology (e.g. you'd need a separate "did/didn't find a valid NUMA domain" flag to keep track).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With