I have a CPU intensive task (looping through a some data and evaluating results). I want to make use of multiple cores for these but my performance is consistently worse than just using a single core.
I've tried:
I'm measuring the results by counting the total number of iterations I can complete and dividing by the amount of time I spent working on the problem. When using a single core, my results are significantly better.
some points of interest:
Any idea as to what is going on here?
Update for threads: I suspect a bug in webworker-threads Skipping express for now, I think the issue may have to do with my thread loop. What I'm doing is creating a threads and then trying to continuously run them but send data back and forth between them. Even though both of the threads are using up CPU, only thread 0 is returning values. My assumption was emit any would generally end up emitting the message to the thread that had been idle the longest but that does not seem to be the case. My set up looks like this
Within threadtask.js
thread.on('init', function() {
thread.emit('ready');
thread.on('start', function(data) {
console.log("THREAD " + thread.id + ": execute task");
//...
console.log("THREAD " + thread.id + ": emit result");
thread.emit('result', otherData));
});
});
main.js
var tp = Threads.createPool(NUM_THREADS);
tp.load(threadtaskjsFilePath);
var readyCount = 0;
tp.on('ready', function() {
readyCount++;
if(readyCount == tp.totalThreads()) {
console.log('MAIN: Sending first start event');
tp.all.emit('start', JSON.stringify(data));
}
});
tp.on('result', function(eresult) {
var result = JSON.parse(eresult);
console.log('MAIN: result from thread ' + result.threadId);
//...
console.log('MAIN: emit start' + result.threadId);
tp.any.emit('start' + result.threadId, data);
});
tp.all.emit("init", JSON.stringify(data2));
The output to this disaster
MAIN: Sending first start event
THREAD 0: execute task
THREAD 1: execute task
THREAD 1: emit result
MAIN: result from thread 1
THREAD 0: emit result
THREAD 0: execute task
THREAD 0: emit result
MAIN: result from thread 0
MAIN: result from thread 0
THREAD 0: execute task
THREAD 0: emit result
THREAD 0: execute task
THREAD 0: emit result
MAIN: result from thread 0
MAIN: result from thread 0
THREAD 0: execute task
THREAD 0: emit result
THREAD 0: execute task
THREAD 0: emit result
MAIN: result from thread 0
MAIN: result from thread 0
I did try another approach as well where I would emit all but then have each thread listen for a message that only it could answer. Eg, thread.on('start' + thread.id, function() { ... }). This doesn't work because in the result when I do tp.all.emit('start' + result.threadId, ... ), the message doesn't get picked up.
MAIN: Sending first start event
THREAD 0: execute task
THREAD 1: execute task
THREAD 1: emit result
THREAD 0: emit result
Nothing more happens after that.
Update for multiple express servers: I'm getting improvements but smaller than expected
I revisited this solution and had more luck. I think my original measurement may have been flawed. New results:
One thing I find a little odd is that I'm not seeing around 6 iterations/second for 2 servers and 9 for 3. I get that there are some losses for networking but if I increase my task time to be sufficiently high, the network losses should be pretty minor I would think.
You shouldn't be pushing your Node.js processes to run multiple threads for performance improvements. Running on a quad-core processor, having 1 express
process handling general requests and 3 express
processes handling the CPU intensive requests would probably be the most effective setup, which is why I would suggest that you try to design your express
processes to defer from using Web workers and simply block until they produce a result. This will get you down to running a single process with a single thread, as per design, most likely yielding the best results.
I do not know the intricacies of how the Web workers package handles synchronization, affects the I/O thread pools of Node.js that happen in c
space, etc., but I believe you would generally want to introduce Web workers to be able to manage more blocking tasks at the same time without severely affecting other requests that require no threading and system I/O, or can otherwise be expediently responded to. It doesn't necessarily mean that applying this would yield improved performance for the particular tasks being performed. If you run 4 processes with 4 threads that perform I/O, you might be locking yourself into wasting time continuously switching between the thread contexts outside the application space.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With