Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Web Workers vs child_process for CPU intensive functions in Node.js [closed]

I'm trying to use node-unfluff, which extracts content from HTML strings. However, it usually takes ~200ms to run. Since it runs synchronously, this is way too slow. I want to make it run asynchronously.

As far as I know, my options are Web Workers (https://github.com/audreyt/node-webworker-threads) or child_process (https://nodejs.org/api/child_process.html). Are there other better options?

If not, which of these is better in terms of speed or other factors?

Edit:

There's also Threads à gogo (https://github.com/xk/node-threads-a-gogo) and tiny-worker (https://github.com/avoidwork/tiny-worker).

WebWorker Threads doesn't support require, so that's no longer an option.

It's possible to require files using Threads à gogo by using its load function, but it seems like a hacky workaround.

tiny-worker has only 26 stars on Github at the moment, so I'm hesitant to use it in production code. It supports require.

I'm considering writing my own WebWorker implementation using child_process if there's no better options.

like image 792
Leo Jiang Avatar asked Jan 25 '17 04:01

Leo Jiang


1 Answers

You can use require with Workers. In your Worker script you'll need to call

self.importScripts('../path/require.js');

As per require docs you can pass a config object to a module:

requirejs.config({
    //By default load any module IDs from js/lib
    baseUrl: 'js/lib',
    //except, if the module ID starts with "app",
    //load it from the js/app directory. paths
    //config is relative to the baseUrl, and
    //never includes a ".js" extension since
    //the paths config could be for a directory.
    paths: {
        app: '../app'
    }
});

// Start the main app logic.
requirejs(['jquery', 'canvas', 'app/sub'],
function   ($,        canvas,   sub) {
    //jQuery, canvas and the app/sub module are all
    //loaded and can be used here now.
});

Putting it together

Worker.js

self.importScripts('../path/require.js');
requirejs.config({
    //By default load any module IDs from path/lib
    baseUrl: 'path/lib',
    //except, if the module ID starts with "app",
    //load it from the js/app directory. paths
    //config is relative to the baseUrl, and
    //never includes a ".js" extension since
    //the paths config could be for a directory.
    paths: {
        app: '../app'
    }
});

// Start the main app logic.
requirejs(['jquery', 'canvas', 'app/sub'],
function   ($,        canvas,   sub) {
    //jQuery, canvas and the app/sub module are all
    //loaded and can be used here now.
    // now you can post a message back to your callee script to let it know require has loaded
    self.postMessage("initialized");
});

self.onmessage = function(message) {
    // do cpu intensive work here, this example is not cpu intensive...
    if(message.data === 'to process') {
        self.postMessage("completed!");
    }
}

Node Worker Call

var worker = new Worker('Worker.js');
worker.onmessage = function(event) {
    var msg = event.data;
    if(msg === 'initialized') {
         worker.postMessage({data: 'to process'});
    }
}
like image 141
Daniel Lane Avatar answered Nov 13 '22 09:11

Daniel Lane