It is possible to use sched_setaffinity
to pin a thread to a cpu, increasing performance (in some situations)
From the linux man page:
Restricting a process to run on a single CPU also avoids the performance cost caused by the cache invalidation that occurs when a process ceases to execute on one CPU and then recommences execution on a different CPU
Further, if I desire a more real-time response, I can change the scheduler policy for that thread to SCHED_FIFO
, and up the priority to some high value (up to sched_get_priority_max
), meaning the thread in question should always pre-empt any other thread running on its cpu when it becomes ready.
However, at this point, the thread running on the cpu which the real-time thread just pre-empted will possibly have evicted much of the real-time thread's level-1 cache entries.
My questions are as follows:
Process Scheduler uses Scheduling Algorithms that helps in deciding the process to be executed. In LINUX, there are two types of processes namely - Real-time processes and Normal processes. O(n) scheduler divides the processor's time into a unit called epochs. Each task is allowed to use at max 1 epoch.
The Linux kernel scheduler is actually scheduling tasks, and these are either threads or (single-threaded) processes. A process is a non-empty finite set (sometimes a singleton) of threads sharing the same virtual address space (and other things like file descriptors, working directory, etc etc...).
A cpuset defines a list of CPUs and memory nodes. The CPUs of a system include all the logical processing units on which a process can execute, including, if present, multiple processor cores within a package and Hyper-Threads within a processor core.
The answer is to use cpusets. The python cpuset utility makes it easy to configure them.
Basic concepts
3 cpusets
root
: present in all configurations and contains all cpus (unshielded)system
: contains cpus used for system tasks - the ones which need to run but aren't "important" (unshielded) user
: contains cpus used for "important" tasks - the ones we want to run in "realtime" mode (shielded) The shield
command manages these 3 cpusets.
During setup it moves all movable tasks into the unshielded cpuset (system
) and during teardown it moves all movable tasks into the root
cpuset. After setup, the subcommand lets you move tasks into the shield (user
) cpuset, and additionally, to move special tasks (kernel threads) from root
to system
(and therefore out of the user
cpuset).
Commands:
First we create a shield. Naturally the layout of the shield will be machine/task dependent. For example, say we have a 4-core non-NUMA machine: we want to dedicate 3 cores to the shield, and leave 1 core for unimportant tasks; since it is non-NUMA we don't need to specify any memory node parameters, and we leave the kernel threads running in the root
cpuset (ie: across all cpus)
$ cset shield --cpu 1-3
Some kernel threads (those which aren't bound to specific cpus) can be moved into the system
cpuset. (In general it is not a good idea to move kernel threads which have been bound to a specific cpu)
$ cset shield --kthread on
Now let's list what's running in the shield (user
) or unshielded (system
) cpusets: (-v
for verbose, which will list the process names) (add a 2nd -v
to display more than 80 characters)
$ cset shield --shield -v $ cset shield --unshield -v -v
If we want to stop the shield (teardown)
$ cset shield --reset
Now let's execute a process in the shield (commands following '--'
are passed to the command to be executed, not to cset
)
$ cset shield --exec mycommand -- -arg1 -arg2
If we already have a running process which we want to move into the shield (note we can move multiple processes by passing a comma separated list, or ranges (any process in the range will be moved, even if there are gaps))
$ cset shield --shield --pid 1234 $ cset shield --shield --pid 1234,1236 $ cset shield --shield --pid 1234,1237,1238-1240
Advanced concepts
cset set/proc
- these give you finer control of cpusets
Set
Create, adjust, rename, move and destroy cpusets
Commands
Create a cpuset, using cpus 1-3, use NUMA node 1 and call it "my_cpuset1"
$ cset set --cpu=1-3 --mem=1 --set=my_cpuset1
Change "my_cpuset1" to only use cpus 1 and 3
$ cset set --cpu=1,3 --mem=1 --set=my_cpuset1
Destroy a cpuset
$ cset set --destroy --set=my_cpuset1
Rename an existing cpuset
$ cset set --set=my_cpuset1 --newname=your_cpuset1
Create a hierarchical cpuset
$ cset set --cpu=3 --mem=1 --set=my_cpuset1/my_subset1
List existing cpusets (depth of level 1)
$ cset set --list
List existing cpuset and its children
$ cset set --list --set=my_cpuset1
List all existing cpusets
$ cset set --list --recurse
Proc
Manage threads and processes
Commands
List tasks running in a cpuset
$ cset proc --list --set=my_cpuset1 --verbose
Execute a task in a cpuset
$ cset proc --set=my_cpuset1 --exec myApp -- --arg1 --arg2
Moving a task
$ cset proc --toset=my_cpuset1 --move --pid 1234 $ cset proc --toset=my_cpuset1 --move --pid 1234,1236 $ cset proc --toset=my_cpuset1 --move --pid 1238-1340
Moving a task and all its siblings
$ cset proc --move --toset=my_cpuset1 --pid 1234 --threads
Move all tasks from one cpuset to another
$ cset proc --move --fromset=my_cpuset1 --toset=system
Move unpinned kernel threads into a cpuset
$ cset proc --kthread --fromset=root --toset=system
Forcibly move kernel threads (including those that are pinned to a specific cpu) into a cpuset (note: this may have dire consequences for the system - make sure you know what you're doing)
$ cset proc --kthread --fromset=root --toset=system --force
Hierarchy example
We can use hierarchical cpusets to create prioritised groupings
system
cpuset with 1 cpu (0)prio_low
cpuset with 1 cpu (1)prio_met
cpuset with 2 cpus (1-2)prio_high
cpuset with 3 cpus (1-3)prio_all
cpuset with all 4 cpus (0-3) (note this the same as root; it is considered good practice to keep a separation from root)To achieve the above you create prio_all, and then create subset prio_high under prio_all, etc
$ cset set --cpu=0 --set=system $ cset set --cpu=0-3 --set=prio_all $ cset set --cpu=1-3 --set=/prio_all/prio_high $ cset set --cpu=1-2 --set=/prio_all/prio_high/prio_med $ cset set --cpu=1 --set=/prio_all/prio_high/prio_med/prio_low
There are two other ways I can think of doing this (though not as elegant as cset, which doesn't seem to have a fantastic level of support from Redhat):
1) Taskset everything including PID 1 - nice and easy (but, alledgly -- I've never seen any issues myself -- may cause inefficiencies in the scheduler). The script below (which must be run as root) runs taskset on all the running processes, including init (pid 1); this will pin all the running processes to one or more 'junk cores', and by also pinning init, it will ensure that any future processes are also started in the list of 'junk cores':
#!/bin/bash if [[ -z $1 ]]; then printf "Usage: %s '<csv list of cores to set as junk in double quotes>'", $0 exit -1; fi for i in `ps -eLfad |awk '{ print $4 } '|grep -v PID | xargs echo `; do taskset -pc $1 $i; done
2) use the isolcpus kernel parameter (here's the documentation from https://www.kernel.org/doc/Documentation/kernel-parameters.txt):
isolcpus= [KNL,SMP] Isolate CPUs from the general scheduler. Format: <cpu number>,...,<cpu number> or <cpu number>-<cpu number> (must be a positive range in ascending order) or a mixture <cpu number>,...,<cpu number>-<cpu number> This option can be used to specify one or more CPUs to isolate from the general SMP balancing and scheduling algorithms. You can move a process onto or off an "isolated" CPU via the CPU affinity syscalls or cpuset. <cpu number> begins at 0 and the maximum value is "number of CPUs in system - 1". This option is the preferred way to isolate CPUs. The alternative -- manually setting the CPU mask of all tasks in the system -- can cause problems and suboptimal load balancer performance.
I've used these two plus the cset mechanisms for several projects (incidentally, please pardon the blatant self promotion :-)), I've just filed a patent for a tool called Pontus Vision ThreadManager that comes up with optimal pinning strategies for any given x86 platform to any given software work loads; after testing it in a customer site, I got really good results (270% reduction in peak latencies), so it's well worth doing pinning and CPU isolation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With