Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is cluster.fork() guaranteed to use a different CPU core?

I've seen a bunch of tutorials about using clustering in node.js to utilize all of your CPU cores, but they all seem to imply that running cluster.fork() will use a new CPU core; for instance:

if (cluster.isMaster) {
    cluster.setupMaster({
        exec: 'bin/www'
    });

    // Fork workers
    for (var i = 0; i < numCPUs; i++) {
        cluster.fork();
    }

    Object.keys(cluster.workers).forEach(function (id) {
        console.log('Worker id: ' + id + ' with pid: ' + cluster.workers[id].process.pid);
    });
}

The node.js documentation is surprisingly bad, and gives very little information on cluster.fork([env]):

Spawn a new worker process.

This can only be called from the master process.

So all it says is that it spawns a new worker process. How do we know that this will be spawned on a new CPU core, and what happens if we spawn an additional process when we've already got at least one process on each of our CPU cores?

like image 603
Jez Avatar asked Sep 08 '16 14:09

Jez


1 Answers

The following is a general point about processes and threads on modern OSes such as Linux, Windows, FreeBSD. Operating systems manage which core a thread/process is running on dynamically, depending on many things (e.g. which core is 'closest' to the memory holding the process's data [this matters a little bit on dual CPU machines], which core is free, etc). Dynamically, as in it may very well switch cores timeslice to timeslice (e.g. 60 times a second).

Don't forget that in an SMP hardware platform there's no real performance penalty for moving a thread from one core to another - it's mostly just a matter of where the core's program counter register is pointing. There's the rest of the process's context too, but that's just a bunch of CPU register values, mostly. Actually modern Intel platforms are not pure SMP if you have 2 or more CPUs. The Quickpath interconnect between them is used to synthesize an SMP hardware architecture from something that is actually NUMA. Quickpath is like a high speed network connection between two separate computers, fast enough to carry data, cache syncs, etc. such that data held on the other CPU's RAM is accessible almost as quickly as if it was on this CPU's RAM.

For an application (e.g. a Node.JS runtime) to have any control over which core it ends up running on is actually quite hard and generally pretty pointless; the OS can be trusted to do a pretty optimal job of managing that for best performance. Taking manual control is fairly pointless. I tried once out of interest (with some real time signal processing code), and couldn't get more performance from the system than simply letting the OS manage core affinity naturally on my behalf.

So when you're calling cluster.fork() (assuming "spawn another worker process" does actually mean starting up another process) then the new one will be run wherever the OS thinks is best for the circumstances prevailing at the time, and will continually bounce it around cores as the process blocks, becomes ready to run, gets preempted by another process, etc.

Spawning more processes than CPUs cores can be a mixed thing. It's fine if they block a lot (e.g waiting for input). In that case they block, the OS deschedules them, and if and when some data turns up on a socket or whatever the requisite one is rescheduled to process the data. If they're continually processing then it's best to match the number of processes/threads to the number of CPU cores to minimise time lost to context switches.

Edit

So you're simply taking a chance that there's nothing else running in the computer at the same time that would prevent the OS from spreading your 4 processes across 4 cores.

If you're running a word processor at the same time, that's mostly quiescent (ie, blocked waiting for keyboard input) when you're not typing so no problem. The OS won't (indeed, can't) schedule a process waiting for input until input data actually arrives. Same for semaphores, mutexes, etc.

In contrast Starting up a Web browser and pointing it at a page with lots of video ads will result in a lot of CPU time for the browser. That's because there's always data to process (the video stream), and decoding it takes a lot of maths.

There's also process priority as another dimension to play with. If you've marked your processes as being high priority then the OS will schedule them ahead of other processes.

OSes schedule threads/processes to achieve maximum throughout for all, trying to be fair to every process in the system. Most OSes will make a decision on each core as to what to run at about 60 times per second. The end result is that it looks like everything is running smoothly in parallel, but in fact everything is taking turns. Process priority is a way to make the OS biased towards a higher priority process when it's taking that decision 60 times per second.

Windows is unique in giving the process which currently has mouse focus an artificial priority boost - makes it look and feel snappier. That dates all the way back to Windows 3.0 I think!

like image 170
bazza Avatar answered Oct 06 '22 04:10

bazza