I am investigating how run a process on a dedicated CPU in order to avoid context-switches. On my Ubuntu, I isolated two CPUs using the kernel parameters "isolcpus=3,7" and "irqaffinity=0-2,4-6". I am sure that it is correctly taken into account:
$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.8.0-27-generic root=UUID=58c66f12-0588-442b-9bb8-1d2dd833efe2 ro quiet splash isolcpus=3,7 irqaffinity=0-2,4-6 vt.handoff=7
After a reboot, I can check that everything works as expected. On a first console I run
$ stress -c 24
stress: info: [31717] dispatching hogs: 24 cpu, 0 io, 0 vm, 0 hdd
And on a second one, using "top" I can check the usage of my CPUs:
top - 18:39:07 up 2 days, 20:48, 18 users, load average: 23,15, 10,46, 4,53
Tasks: 457 total, 26 running, 431 sleeping, 0 stopped, 0 zombie
%Cpu0 :100,0 us, 0,0 sy, 0,0 ni, 0,0 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
%Cpu1 : 98,7 us, 1,3 sy, 0,0 ni, 0,0 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
%Cpu2 : 99,3 us, 0,7 sy, 0,0 ni, 0,0 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
%Cpu3 : 0,0 us, 0,0 sy, 0,0 ni,100,0 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
%Cpu4 : 95,7 us, 4,3 sy, 0,0 ni, 0,0 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
%Cpu5 : 98,0 us, 2,0 sy, 0,0 ni, 0,0 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
%Cpu6 : 98,7 us, 1,3 sy, 0,0 ni, 0,0 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
%Cpu7 : 0,0 us, 0,0 sy, 0,0 ni,100,0 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
KiB Mem : 7855176 total, 385736 free, 5891280 used, 1578160 buff/cache
KiB Swap: 15624188 total, 10414520 free, 5209668 used. 626872 avail Mem
CPUs 3 and 7 are free while the 6 other ones are fully busy. Fine.
For the rest of my test, I will use a small application that does almost pure processing
- It uses two int buffers of the same size
- It reads one-by-one all the values of the first buffer
- each value is a random index in the second buffer
- It reads the value at the index in the second buffer
- It sums all the values taken from the second buffer
- It does all the previous steps for bigger and bigger
- At the end, I print the number of voluntary and involuntary CPU context switches
I am now studying my application when I launch it:
I do it via the following command lines:
$ ./TestCpuset ### launch on any non-isolated CPU
$ taskset -c 7 ./TestCpuset ### launch on isolated CPU 7
When launched on any CPU, the numbers of context switches change from 20 to... thousands
When launched on an isolated CPU, the number of context switches is almost constant (between 10 and 20), even if I launch in parallel a "stress -c 24".(looks quite normal)
But my question is: why isn't it 0 absolutely 0? When a switch is done on a process, it is in order to replace it by another process? But in my case there is no other process to replace with!
I have an hypothesis which is that the "isolcpus" option would isolate CPU form any process (unless the process an CPU affinity would be given, such as what is done with "taskset") but not from kernel tasks. However, I found no documentation about it
I would appreciate any help in order to reach 0 context-switches
FYI, this question is closed to another one I previously opened: Cannot allocate exclusively a CPU for my process
Here is the code of the program I am using:
#include <limits.h>
#include <iostream>
#include <unistd.h>
#include <sys/time.h>
#include <sys/resource.h>
const unsigned int BUFFER_SIZE = 4096;
using namespace std;
class TimedSumComputer
{
public:
TimedSumComputer() :
sum(0),
bufferSize(0),
valueBuffer(0),
indexBuffer(0)
{}
public:
virtual ~TimedSumComputer()
{
resetBuffers();
}
public:
void init(unsigned int bufferSize)
{
this->bufferSize = bufferSize;
resetBuffers();
initValueBuffer();
initIndexBuffer();
}
private:
void resetBuffers()
{
delete [] valueBuffer;
delete [] indexBuffer;
valueBuffer = 0;
indexBuffer = 0;
}
void initValueBuffer()
{
valueBuffer = new unsigned int[bufferSize];
for (unsigned int i = 0 ; i < bufferSize ; i++)
{
valueBuffer[i] = randomUint();
}
}
static unsigned int randomUint()
{
int value = rand() % UINT_MAX;
return value;
}
protected:
void initIndexBuffer()
{
indexBuffer = new unsigned int[bufferSize];
for (unsigned int i = 0 ; i < bufferSize ; i++)
{
indexBuffer[i] = rand() % bufferSize;
}
}
public:
unsigned int getSum() const
{
return sum;
}
unsigned int computeTimeInMicroSeconds()
{
struct timeval startTime, endTime;
gettimeofday(&startTime, NULL);
unsigned int sum = computeSum();
gettimeofday(&endTime, NULL);
return ((endTime.tv_sec - startTime.tv_sec) * 1000 * 1000) + (endTime.tv_usec - startTime.tv_usec);
}
unsigned int computeSum()
{
sum = 0;
for (unsigned int i = 0 ; i < bufferSize ; i++)
{
unsigned int index = indexBuffer[i];
sum += valueBuffer[index];
}
return sum;
}
protected:
unsigned int sum;
unsigned int bufferSize;
unsigned int * valueBuffer;
unsigned int * indexBuffer;
};
unsigned int runTestForBufferSize(TimedSumComputer & timedComputer, unsigned int bufferSize)
{
timedComputer.init(bufferSize);
unsigned int timeInMicroSec = timedComputer.computeTimeInMicroSeconds();
cout << "bufferSize = " << bufferSize << " - time (in micro-sec) = " << timeInMicroSec << endl;
return timedComputer.getSum();
}
void runTest(TimedSumComputer & timedComputer)
{
unsigned int result = 0;
for (unsigned int i = 1 ; i < 10 ; i++)
{
result += runTestForBufferSize(timedComputer, BUFFER_SIZE * i);
}
unsigned int factor = 1;
for (unsigned int i = 2 ; i <= 6 ; i++)
{
factor *= 10;
result += runTestForBufferSize(timedComputer, BUFFER_SIZE * factor);
}
cout << "result = " << result << endl;
}
void printPid()
{
cout << "###############################" << endl;
cout << "Pid = " << getpid() << endl;
cout << "###############################" << endl;
}
void printNbContextSwitch()
{
struct rusage usage;
getrusage(RUSAGE_THREAD, &usage);
cout << "Number of voluntary context switch: " << usage.ru_nvcsw << endl;
cout << "Number of involuntary context switch: " << usage.ru_nivcsw << endl;
}
int main()
{
printPid();
TimedSumComputer timedComputer;
runTest(timedComputer);
printNbContextSwitch();
return 0;
}
A context switch is a procedure that a computer's CPU (central processing unit) follows to change from one task (or process) to another while ensuring that the tasks do not conflict. Effective context switching is critical if a computer is to provide user-friendly multitasking.
The disadvantage of context switching is that it requires some time for context switching i.e. the context switching time. Time is required to save the context of one process that is in the running state and then getting the context of another process that is about to come in the running state.
(c) A context switch can occur without a mode switch.
Today, I obtained more clues regarding my problem I realized that I had to investigate deeply what was happening in the Kernel scheduler. I found these two pages:
I enabled scheduler tracing while my application was running like that:
# sudo bash
# cd /sys/kernel/debug/tracing
# echo 1 > options/function-trace ; echo function_graph > current_tracer ; echo 1 > tracing_on ; echo 0 > tracing_max_latency ; taskset -c 7 [path-to-my-program]/TestCpuset ; echo 0 > tracing_on
# cat trace
As my program was launched on CPU 7 (taskset -c 7), I have to filter the "trace" output
# grep " 7)" trace
I can then search for transitions, from one process to another one:
# grep " 7)" trace | grep "=>"
...
7) TestCpu-4753 => kworker-5866
7) kworker-5866 => TestCpu-4753
7) TestCpu-4753 => watchdo-26
7) watchdo-26 => TestCpu-4753
7) TestCpu-4753 => kworker-5866
7) kworker-5866 => TestCpu-4753
7) TestCpu-4753 => kworker-5866
7) kworker-5866 => TestCpu-4753
7) TestCpu-4753 => kworker-5866
7) kworker-5866 => TestCpu-4753
...
Bingo! It seems that the context switches I am tracking are transitions to:
I now have to find:
For course, once again I would appreciate any help :-P
Potentially any syscall could involve context a switch. When you access paged out memory it may increase context switch count too. To reach 0 context switches you would need to force kernel to keep all the memory your program uses mapped to its address space, and you would need to be sure that none of syscalls you invoke entails a context switch. I believe it may be possible on kernels with RT patches, but probably hard to achieve on standard distro kernel.
For the sake of those finding this via google (like me), /sys/devices/virtual/workqueue/cpumask
controls where the kernel may queue works queued with WORK_CPU_UNBOUND
(Don't care which cpu). As of writing this answer, it's not set to the same mask as the one isolcpus
manipulates by default.
Once I changed it to not include my isolated cpus, I saw a significantly smaller (but not zero) amount of context switches to my critical threads. I assume that the works that did run on my isolated cpus must have requested it specifically, such as by using schedule_on_each_cpu
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With