Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to execute parallel processing in Node.js

Tags:

I'm trying to write a small node application that will search through and parse a large number of files on the file system. In order to speed up the search, we are attempting to use some sort of map reduce. The plan would be the following simplified scenario:

  • Web request comes in with a search query
  • 3 processes are started that each get assigned 1000 (different) files
  • once a process completes, it would 'return' it's results back to the main thread
  • once all processes complete, the main thread would continue by returning the combined result as a JSON result

The questions I have with this are: Is this doable in Node? What is the recommended way of doing it?

I've been fiddling, but come no further then following example using Process:

initiator:

function Worker() { return child_process.fork("myProcess.js); } for(var i = 0; i < require('os').cpus().length; i++){         var process = new Worker();         process.send(workItems.slice(i * itemsPerProcess, (i+1) * itemsPerProcess)); } 

myProcess.js

process.on('message', function(msg) {     var valuesToReturn = [];     // Do file reading here     //How would I return valuesToReturn?     process.exit(0); } 

Few sidenotes:

  • I'm aware the number of processes should be dependent of the number of CPU's on the server
  • I'm also aware of speed restrictions in a file system. Consider it a proof of concept before we move this to a database or Lucene instance :-)
like image 933
Bart Vangeneugden Avatar asked Nov 15 '13 15:11

Bart Vangeneugden


People also ask

Which one helps us to parallel processing in NodeJS?

Worker Threads help us offload CPU intensive tasks away from the Event Loop to be executed parallelly in a non-blocking manner. A worker thread runs a piece of code as instructed by the parent thread in isolation from the parent and other worker threads.

How do I run a parallel code in NodeJS?

First, you won't really be running in parallel while in a single node application. A node application runs on a single thread and only one event at a time is processed by node's event loop. Even when running on a multi-core box you won't get parallelism of processing within a node application.

Can NodeJS run parallel?

NodeJS is a runtime environment for JavaScript. It's server-side and single threaded. That being said, we want to do things asynchronously and in parallel. Now, Node uses several threads, just one execution thread, and a lot goes into it to make it asynchronous, such as queues and the libuv library.


1 Answers

Should be doable. As a simple example:

// parent.js var child_process = require('child_process');  var numchild  = require('os').cpus().length; var done      = 0;  for (var i = 0; i < numchild; i++){   var child = child_process.fork('./child');   child.send((i + 1) * 1000);   child.on('message', function(message) {     console.log('[parent] received message from child:', message);     done++;     if (done === numchild) {       console.log('[parent] received all results');       ...     }   }); }  // child.js process.on('message', function(message) {   console.log('[child] received message from server:', message);   setTimeout(function() {     process.send({       child   : process.pid,       result  : message + 1     });     process.disconnect();   }, (0.5 + Math.random()) * 5000); }); 

So the parent process spawns an X number of child processes and passes them a message. It also installs an event handler to listen for any messages sent back from the child (with the result, for instance).

The child process waits for messages from the parent, and starts processing (in this case, it just starts a timer with a random timeout to simulate some work being done). Once it's done, it sends the result back to the parent process and uses process.disconnect() to disconnect itself from the parent (basically stopping the child process).

The parent process keeps track of the number of child processes started, and the number of them that have sent back a result. When those numbers are equal, the parent received all results from the child processes so it can combine all results and return the JSON result.

like image 98
robertklep Avatar answered Sep 20 '22 12:09

robertklep