I have a server with 2 NUMA node with 16 CPUs each. I can see all the 32 CPUs in task manager, first 16 (NUMA node 1) in the first 2 rows and the next 16 (NUMA node 2) in the last 2 rows.
In my app I am starting 64 threads, using Thread.Start()
. When I run the app, it's CPU intensive, only the first 16 CPUs are busy, the other 16 CPUs are idle.
Why? I am using Interlocked.Increment()
a lot. Could this be a reason?
Is there a way I can start threads on a specific NUMA node?
NUMA is an alternative approach that links several small, cost-effective nodes using a high-performance connection. Each node contains processors and memory, much like a small SMP system. However, an advanced memory controller allows a node to use memory on all other nodes, creating a single system image.
Select the VM Options tab and expand Advanced. Under Configuration Parameters, click the Edit Configuration button. Click Add Row to add a new option. To specify NUMA node for the virtual machine, in the Name column, enter numa.
Right click on the instance in the object explorer and select the CPU tab. Expand the “ALL” option. However many NUMA nodes are shown is the number of NUMA nodes that you have as shown below. You can even expand each NUMA nodes to see which logical processors are in each NUMA node.
This works by logically dividing the local memory bank into two equal parts. The resulting benefit is that each AMD CPU can use two NUMA nodes. In this best practice, the NUMA Nodes per Socket was set to 2.
In addition to gcserver
we should enable GCCpuGroup
and Thread_UseAllCpuGroups
so the config should be more like:
<configuration
<runtime>
<gcServer enabled="true"/>
<GCCpuGroup enabled="true"/>
<Thread_UseAllCpuGroups enabled="true"/>
</runtime>
</configuration>
GcCpuGroup
enables Garbage Collection for multiple CPU groups and Thread_UseAllCpuGroups
enables manage thread distribution across all CPU groups for the runtime.
First thing to check would be indeed the app.config
making sure the necessary options are set:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<runtime>
<gcServer enabled="true" />
<Thread_UseAllCpuGroups enabled="true" />
<GCCpuGroup enabled="true" />
</runtime>
<startup>
<!-- 4.5 and later should work, use the one targeted -->
<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.6.2"/>
</startup>
</configuration>
If app.config
-Wizadry isn't helping, is likely that your machine uses multiple kernel groups (Kgroups) when it shouldn't. You can then check your BIOS for NUMA Group Size Optimization
if you have Gen9 HP. If it is in Clustered
mode, the current CLR (2017, .net 4.6.2) only utilizes the first one. If you have no more than 64 cores in that machine, you should be able select the Flat
layout which puts all cores in the same group. If you cannot find it, you may need a BIOS Update.
For lot more details see Unable to use more than one processor group for my threads in a C# app here on StackOverflow. It even comes with its own diagnostics tool.
Have you set the garbage collector to the server version?
In app.config, try:
<configuration
<runtime>
<gcServer enabled="true"/>
</runtime>
</configuration>
Because of the way the heaps are allocated the server GC makes a massive difference when churning a lot of objects/data on a lot of threads in a machine with many cores.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With