Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does using taskset to run a multi-threaded Linux program on a set of isolated cores cause all threads to run on one core?

Desired behaviour: run a multi-threaded Linux program on a set of cores which have been isolated using isolcpus.

Here's a small program we can use as an example multi-threaded program:

#include <stdio.h>
#include <pthread.h>
#include <err.h>
#include <unistd.h>
#include <stdlib.h>

#define NTHR    16
#define TIME    60 * 5

void *
do_stuff(void *arg)
{
    int i = 0;

    (void) arg;
    while (1) {
        i += i;
        usleep(10000); /* dont dominate CPU */
    }
}

int
main(void)
{
    pthread_t   threads[NTHR];
    int     rv, i;

    for (i = 0; i < NTHR; i++) {
        rv = pthread_create(&threads[i], NULL, do_stuff, NULL);
        if (rv) {
            perror("pthread_create");
            return (EXIT_FAILURE);
        }
    }
    sleep(TIME);
    exit(EXIT_SUCCESS);
}

If I compile and run this on a kernel with no isolated CPUs, then the threads are spread out over my 4 CPUs. Good!

Now if I add isolcpus=2,3 to the kernel command line and reboot:

  • Running the program without taskset distributes threads over cores 0 and 1. This is expected as the default affinity mask now excludes cores 2 and 3.
  • Running with taskset -c 0,1 has the same effect. Good.
  • Running with taskset -c 2,3 causes all threads to go onto the same core (either core 2 or 3). This is undesired. Threads should distribute over cores 2 and 3. Right?

This post describes a similar issue (although the example given is farther away from the pthreads API). The OP was happy to workaround this by using a different scheduler. I'm not certain this is ideal for my use-case however.

Is there a way to have the threads distributed over the isolated cores using the default scheduler?

Is this a kernel bug which I should report?

EDIT:

The right thing does indeed happen if you use a real-time scheduler like the fifo scheduler. See man sched and man chrt for details.

like image 328
Edd Barrett Avatar asked Apr 13 '16 16:04

Edd Barrett


People also ask

What does Taskset do in Linux?

The taskset command is used to set or retrieve the CPU affinity of a running process given its pid, or to launch a new command with a given CPU affinity. CPU affinity is a scheduler property that "bonds" a process to a given set of CPUs on the system.

What is CPU affinity in Linux?

As per Wikipedia, Processor affinity, or CPU pinning or “cache affinity”, enables the binding and unbinding of a process or a thread to a central processing unit (CPU) or a range of CPUs, so that the process or thread will execute only on the designated CPU or CPUs rather than any CPU.

What is affinity thread?

Thread affinity provides a way for an application thread to tell the OS scheduler exactly where its threads can (and would like to) run. The scheduler in turn does not have to spend a lot of time load balancing the system because application threads are already where they need to be.


1 Answers

From the Linux Kernel Parameter Doc:

This option can be used to specify one or more CPUs to isolate from the general SMP balancing and scheduling algorithms.

So this options would effectively prevent scheduler doing thread migration from one core to another less contended core (SMP balancing). As typical isolcpus are used together with pthread affinity control to pin threads with knowledge of CPU layout to gain predictable performance.

https://www.kernel.org/doc/Documentation/kernel-parameters.txt

--Edit--

Ok I see why you are confused. Yeah personally I would assume consistent behavior on this option. The problem lies around two functions, select_task_rq_fair and select_task_rq_rt, which is responsible for selecting new run_queue (which is essentially selecting which next_cpu to run on). I did a quick trace (Systemtap) of both functions, for CFS it would always return the same first core in the mask; for RT, it would return other cores. I haven't got a chance to look into the logic in each selection algorithm but you can send an email to the maintainer in Linux devel mailing list for fix.

like image 56
Wei Shen Avatar answered Sep 19 '22 06:09

Wei Shen