UPDATE: Even if this particular scenario is not realistic, as per comments, I'm still interested in how one could write a module that makes use of clustering without rerunning the parent process each time.
I'm trying to write a Node.js module called mass-request
that speeds up large numbers of HTTP requests by distributing them to child processes.
My hope is that, on the outside, it work like this.
var mr = require("mass-request"),
scraper = mr();
for (var i = 0; i < my_urls_to_visit.length; i += 1) {
scraper.add(my_urls_to_visit[i], function(resp) {
// do something with response
}
}
To get started, I put together a skeleton for the mass-request module.
var cluster = require("cluster"),
numCPUs = require("os").cpus().length;
module.exports = function() {
console.log("hello from mass-request!");
if (cluster.isMaster) {
for (var i = 0; i < numCPUs; i += 1) {
var worker = cluster.fork();
}
return {
add: function(url, cb) {}
}
} else {
console.log("worker " + process.pid + " is born!");
}
}
Then I test it like so in a test script:
var m = mr();
console.log("hello from test.js!", m);
I expected to see "hello from mass-request!" logged four times (as indeed it is). To my amazement, I also see "hello from test.js" four times. Clearly I do not understand how cluster.fork()
works. Is it rerunning the whole process, not just the function that call it the first time?
If so, how does one make use of clustering in a module without troubling the person who uses that module with messy multi-process logic?
Node. js runs single threaded programming, which is very memory efficient, but to take advantage of computers multi-core systems, the Cluster module allows you to easily create child processes that each runs on their own single thread, to handle the load.
A cluster module executes the same Node. js process multiple times. Therefore, the first thing you need to do is to identify what portion of the code is for the master process and what portion is for the workers.
The cluster module enables creating child processes (workers) that run simultaneously while sharing the same server port. Every child process has its own event loop, memory, and V8 instance. The child processes use interprocess communication to communicate to the main parent Node. js process.
Clusters and NodesA cluster is made up of nodes that run containerized applications. Each cluster also has a master (control plane) that manages the nodes and pods (more on pods below) of the cluster.
I believe what you are looking for is in setupMaster
From the docs:
cluster.setupMaster([settings])
- settings Object
- exec String file path to worker file. (Default=process.argv[1])
- args Array string arguments passed to worker. (Default=process.argv.slice(2))
- silent Boolean whether or not to send output to parent's stdio. (Default=false)
setupMaster is used to change the default 'fork' behavior. Once called, the settings will be present in cluster.settings
By making use of the exec property you can have your workers launched from a different module.
Important: as the docs state, this can only be called once. If you are depending on this behavior for your module, then the caller can't be using cluster
or the whole thing falls apart.
For example:
index.js
var cluster = require("cluster"),
path = require("path"),
numCPUs = require("os").cpus().length;
console.log("hello from mass-request!");
if (cluster.isMaster) {
cluster.setupMaster({
exec: path.join(__dirname, 'worker.js')
});
for (var i = 0; i < numCPUs; i += 1) {
var worker = cluster.fork();
}
return {
add: function (url, cb) {
}
}
} else {
console.log("worker " + process.pid + " is born!");
}
worker.js
console.log("worker " + process.pid + " is born!");
output
node index.js
hello from mass-request!
worker 38821 is born!
worker 38820 is born!
worker 38822 is born!
worker 38819 is born!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With