Using cluster in a Node module

Tags:

UPDATE: Even if this particular scenario is not realistic, as per comments, I'm still interested in how one could write a module that makes use of clustering without rerunning the parent process each time.

I'm trying to write a Node.js module called mass-request that speeds up large numbers of HTTP requests by distributing them to child processes.

My hope is that, on the outside, it work like this.

Click to copy

var mr = require("mass-request"),
    scraper = mr();

for (var i = 0; i < my_urls_to_visit.length; i += 1) {
    scraper.add(my_urls_to_visit[i], function(resp) {
        // do something with response
    }
}

To get started, I put together a skeleton for the mass-request module.

Click to copy

var cluster = require("cluster"),
    numCPUs = require("os").cpus().length;

module.exports = function() {
    console.log("hello from mass-request!");
    if (cluster.isMaster) {
        for (var i = 0; i < numCPUs; i += 1) {
            var worker = cluster.fork();             
        }

        return {
            add: function(url, cb) {}       
        }       
    } else {
        console.log("worker " + process.pid + " is born!");
    }  
}

Then I test it like so in a test script:

Click to copy

var m = mr();
console.log("hello from test.js!", m);

I expected to see "hello from mass-request!" logged four times (as indeed it is). To my amazement, I also see "hello from test.js" four times. Clearly I do not understand how cluster.fork() works. Is it rerunning the whole process, not just the function that call it the first time?

If so, how does one make use of clustering in a module without troubling the person who uses that module with messy multi-process logic?

867

asked May 20 '14 23:05

Chris Wilson

1 Answers

I believe what you are looking for is in setupMaster

From the docs:

cluster.setupMaster([settings])

settings Object

exec String file path to worker file. (Default=process.argv[1])

args Array string arguments passed to worker. (Default=process.argv.slice(2))

silent Boolean whether or not to send output to parent's stdio. (Default=false)

setupMaster is used to change the default 'fork' behavior. Once called, the settings will be present in cluster.settings

By making use of the exec property you can have your workers launched from a different module.

Important: as the docs state, this can only be called once. If you are depending on this behavior for your module, then the caller can't be using cluster or the whole thing falls apart.

For example:

index.js

Click to copy

var cluster = require("cluster"),
  path = require("path"),
  numCPUs = require("os").cpus().length;

console.log("hello from mass-request!");
if (cluster.isMaster) {
  cluster.setupMaster({
    exec: path.join(__dirname, 'worker.js')
  });

  for (var i = 0; i < numCPUs; i += 1) {
    var worker = cluster.fork();
  }

  return {
    add: function (url, cb) {
    }
  }
} else {
  console.log("worker " + process.pid + " is born!");
}

worker.js

Click to copy

console.log("worker " + process.pid + " is born!");

output

Click to copy

node index.js 
hello from mass-request!
worker 38821 is born!
worker 38820 is born!
worker 38822 is born!
worker 38819 is born!

140

answered Sep 28 '22 05:09

dc5

Related questions
                            
                                Java Sockets: One Server and Multiple Clients
                            
                                Does atomic actually mean anything for a synthesized primitive?
                            
                                android waiting for response from server
                            
                                reader/writer lock in pthread
                            
                                FLS vs TLS, can I use Fiber Local Storage in place of TLS?
                            
                                Garbage-collect a lock once no threads are asking for it
                            
                                Thread local data in C
                            
                                How to nest parallel loops in a sequential loop with OpenMP
                            
                                Catching signals such as SIGSEGV and SIGFPE in multithreaded program
                            
                                What happens when an Async value is garbage-collected?
                            
                                IntentService and Threadpool
                            
                                using threading in pygame
                            
                                Speed up sending multiple emails through smtp server using System.Net.Mail
                            
                                Is it possible for multiple Dynamic Link Libraries (DLL) to share Thread Local Storage from a Static Library (LIB)
                            
                                How does `lock` (Monitor) work in .NET?
                            
                                How to write Java code that is synchronized on an instance of an entity
                            
                                Multiple threads calling the @Cacheable method. Spring cache (3.2.6) is allowing all threads into the method
                            
                                Using Thread with Vaadin?
                            
                                High performance caching
                            
                                Explanation of Thread.MemoryBarrier() Bug with OoOP

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using cluster in a Node module

Tags:

node.js

multithreading

cluster-computing

Chris Wilson

People also ask

1 Answers

dc5

Recent Activity

Donate For Us