Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Node.js async parallel - what consequences are?

There is code,

async.series(tasks, function (err) {
    return callback ({message: 'tasks execution error', error: err});
});

where, tasks is array of functions, each of it peforms HTTP request (using request module) and calling MongoDB API to store the data (to MongoHQ instance).

With my current input, (~200 task to execute), it takes

  [normal mode] collection cycle: 1356.843 sec. (22.61405 mins.)

But simply trying change from series to parallel, it gives magnificent benefit. The almost same amount of tasks run in ~30 secs instead of ~23 mins.

But, knowing that nothing is for free, I'm trying to understand what the consequences of that change? Can I tell that number of open sockets will be much higher, more memory consumption, more hit to DB servers?

Machine that I run the code is only 1GB of RAM Ubuntu, so I so that app hangs there one time, can it be caused by lacking of resources?

like image 275
Alexander Beletsky Avatar asked Aug 03 '13 11:08

Alexander Beletsky


People also ask

What does async parallel do?

Asynchronous operations in parallel The method async. parallel() is used to run multiple asynchronous operations in parallel. The first argument to async. parallel() is a collection of the asynchronous functions to run (an array, object or other iterable).

What is async series and async parallel in NodeJS?

Async is a utility module which provides straight-forward, powerful functions for working with asynchronous JavaScript. Although originally designed for use with Node. js and installable via npm install --save async , it can also be used directly in the browser.

Does async await run in parallel?

In order to run multiple async/await calls in parallel, all we need to do is add the calls to an array, and then pass that array as an argument to Promise. all() .


2 Answers

Your intuition is correct that the parallelism doesn't come for free, but you certainly may be able to pay for it.

Using a load testing module (or collection of modules) like nodeload, you can quantify how this parallel operation is affecting your server to determine if it is acceptable.

Async.parallelLimit can be a good way of limiting server load if you need to, but first it is important to discover if limiting is necessary. Testing explicitly is the best way to discover the limits of your system (eachLimit has a different signature, but could be used as well).

Beyond this, common pitfalls using async.parallel include wanting more complicated control flow than that function offers (which, from your description doesn't seem to apply) and using parallel on too large of a collection naively (which, say, may cause you to bump into your system's file descriptor limit if you are writing many files). With your ~200 request and save operations on 1GB RAM, I would imagine you would be fine as long as you aren't doing much massaging in the event handlers, but if you are experiencing server hangs, parallelLimit could be a good way out.

Again, testing is the best way to figure these things out.

like image 134
Wyatt Avatar answered Oct 01 '22 21:10

Wyatt


I would point out that async.parallel executes multiple functions concurrently not (completely) parallely. It is more like virtual parallelism.

Executing concurrently is like running different programs on a single CPU core, via multitasking/scheduling. True parallel execution would be running different program on each core of multi-core CPU. This is important as node.js has single-threaded architecture.

The best thing about node is that you don't have to worry about I/O. It handles I/O very efficiently.

In your case you are storing data to MongoDB, is mostly I/O. So running them parallely will use up your network bandwidth and if reading/writing from disk then disk bandwidth too. Your server will not hang because of CPU overload.


The consequence of this would be that if you overburden your server, your requests may fail. You may get EMFILE error (Too many open files). Each socket counts as a file. Usually connections are pooled, meaning to establish connection a socket is picked from the pool and when finished return to the pool. You can increase the file descriptor with ulimit -n xxxx.

You may also get socket errors when overburdened like ECONNRESET(Error: socket hang up), ECONNREFUSED or ETIMEDOUT. So handle them with properly. Also check the maximum number of simultaneous connections for mongoDB server too.


Finally the server can hangup because of garbage collection. Garbage collection kicks in after your memory increases to a certain point, then runs periodically after some time. The max heap memory V8 can have is around 1.5 GB, so expect GC to run frequently if its memory is high. Node will crash with process out of memory if asking for more, than that limit. So fix the memory leaks in your program. You can look at these tools.

like image 27
user568109 Avatar answered Oct 01 '22 19:10

user568109