Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cause of involuntary context switches

I'm trying to profile a multithreaded program I've written on a somewhat large machine (32-cores, 256GB RAM). I've noticed that between runs, the performance of the program can vary drastically (70-80%). I can't seem to find the cause of this giant variance in the program's performance, but by analyzing the result of the 'time' utility on a large number of runs, I've noticed that the number of involuntary context switches correlates highly with program performance (obviously, fewer context switches lead to better performance and vice-versa).

Is there any good way to determine what's causing this context switching? If I can discover the culprit, then maybe I can try to fix the problem. I have a few particular restrictions on tools I can use, however. First, I don't have root privileges on the machine, so any tools requiring such privileges are out. Second, it's a fairly old kernel (RHEL5, kernel 2.6.18), so some of the standard perf-event stuff may not be present. Anyway, any suggestions on how to dig deeper into the cause of this context switching would be greatly appreciated.

update: I decided to test my program on a different (and smaller) machine. The other machine is a 4-core (with hypertheading) linux box with 8Gb of RAM, and a much newer kernel --- 3.2.0 vs 2.6.18 on the other machine. On the new machine, I'm unable to reproduce the bi-modal performance profile. This leads me to believe that the issue is either due to a hardware issue (as was suggested in the comments) or to a particularly pathological case at the kernel level that has since been fixed. My current best hypothesis is that it may be a result of the fact that the new machine has a kernel with the completely fair scheduler (CFS) while the old machine does not. Is there a way to test this hypothesis (to tell the new machine to use a different / older scheduler) without having to recompile an ancient kernel version for the new machine?

like image 783
nomad Avatar asked Jun 23 '13 23:06

nomad


People also ask

What causes a context switch to occur?

The most common reasons for a context switch are: The time slice has elapsed. A thread with a higher priority has become ready to run. A running thread needs to wait.

What factors affect context switching in an operating system?

Context Switching StepsSave the context of the process that is currently running on the CPU. Update the process control block and other important fields. Move the process control block of the above process into the relevant queue such as the ready queue, I/O queue etc. Select a new process for execution.

What causes context switching Linux?

Context switching can be due to multitasking, Interrupt handling , user & kernel mode switching. The interrupt rate will naturally go high, if there is higher network traffic, or higher disk traffic. Also it is dependent on the application which every now and then invoking system calls.


1 Answers

You mentioned there is 32 cores but what is the exact layout of the hardware? E.g. how many packages the machine has, how many cores, how the cache is shared etc. For sharing this kind of information I personally like sharing the output of likwid-topology -g.

Anyway, there is one piece of non-determinism in your run: thread affinity. The operating system assigns the SW threads to run on specific HW threads somehow without taking into account the knowledge about how the threads communicate (just because it doesn't have that knowledge). That can cause all kinds of effects so for reproducible runs it's a good idea to make sure you pin your SW threads to HW threads in some way (there may be an optimal way, too, but so far I am just talking about determinism).

For pinning (a.k.a. affinity) you can either use explicit Pthread calls or you might try another tool from the Likwid suite called likwid-pin - see here.

If that doesn't get you consistent results, run a good profiler (e.g. Intel VTune) on your workload making sure you capture a faster run and a slower run and then compare the results. In VTune you can use the compare feature that shows two profiles side by side.

like image 155
Alexey Alexandrov Avatar answered Sep 22 '22 23:09

Alexey Alexandrov