Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Node.js: Best way to perform multiple async operations, then do something else?

In the following code I am trying to make multiple (around 10) HTTP requests and RSS parses in one go.

I am using the standard forEach construct on an array of URIs I need to access and parse the result of.

Code:

var articles;

feedsToFetch.forEach(function (feedUri)
{   
        feed(feedUri, function(err, feedArticles) 
        {
            if (err)
            {
                throw err;
            }
            else
            {
                articles = articles.concat(feedArticles);
            }
        });
 });

 // Code I want to run once all feedUris have been visited

I understand that when calling a function once I should be using a callback. However, the only way I can think of using a callback in this example would be to call a function which counts how many times it has been called and only continues when it has been called the same amount of times as feedsToFetch.length which seems hacky.

So my question is, what is the best way to handle this type of situation in node.js.

Preferably without any form of blocking! (I still want that blazing fast speed). Is it promises or something else?

Thanks, Danny

like image 498
dannybrown Avatar asked Oct 09 '14 00:10

dannybrown


2 Answers

HACK-FREE SOLUTION

Promises to be included in next JavaScript version

The popular Promise libraries give you an .all() method for this exact use case (waiting for a bunch of async calls to complete, then doing something else). It's the perfect match for your scenario

Bluebird also has .map(), which can take an array of values and use it to start a Promise chain.

Here is an example using Bluebird .map():

var Promise = require('bluebird');
var request = Promise.promisifyAll(require('request'));

function processAllFeeds(feedsToFetch) {    
    return Promise.map(feedsToFetch, function(feed){ 
        // I renamed your 'feed' fn to 'processFeed'
        return processFeed(feed) 
    })
    .then(function(articles){
        // 'articles' is now an array w/ results of all 'processFeed' calls
        // do something with all the results...
    })
    .catch(function(e){
        // feed server was down, etc
    })
}

function processFeed(feed) { 
    // use the promisified version of 'get'
    return request.getAsync(feed.url)... 
}

Notice also that you don't need to use closure here to accumulate the results.

The Bluebird API Docs are really well written too, with lots of examples, so it makes it easier to pick up.

Once I learned Promise pattern, it made life so much easier. I can't recommend it enough.

Also, here is a great article about different approaches to dealing with async functions using promises, the async module, and others

Hope this helps!

like image 90
aarosil Avatar answered Sep 27 '22 17:09

aarosil


No hacks necessary

I would recommend using the async module as it makes these kinds of things a lot easier.

async provides async.eachSeries as an async replacement for arr.forEach and allows you to pass a done callback function when it's complete. It will process each items in a series, just as forEach does. Also, it will conveniently bubble errors to your callback so that you don't have to have handler logic inside the loop. If you want/require parallel processing, you can use async.each.

There will be no blocking between the async.eachSeries call and the callback.

async.eachSeries(feedsToFetch, function(feedUri, done) {

  // call your async function
  feed(feedUri, function(err, feedArticles) {

    // if there's an error, "bubble" it to the callback
    if (err) return done(err);

    // your operation here;
    articles = articles.concat(feedArticles);

    // this task is done
    done();
  });
}, function(err) {

  // errors generated in the loop above will be accessible here
  if (err) throw err;

  // we're all done!
  console.log("all done!");
});

Alternatively, you could build an array of async operations and pass them to async.series. Series will process your results in a series (not parallel) and call the callback when each function is complete. The only reason to use this over async.eachSeries would be if you preferred the familiar arr.forEach syntax.

// create an array of async tasks
var tasks = [];

feedsToFetch.forEach(function (feedUri) {

  // add each task to the task array
  tasks.push(function() {

    // your operations
    feed(feedUri, function(err, feedArticles) {
      if (err) throw err;
      articles = articles.concat(feedArticles);
    });
  });
});

// call async.series with the task array and callback
async.series(tasks, function() {
 console.log("done !");
});

Or you can Roll Your Own™

Perhaps you're feeling extra ambitious or maybe you don't want to rely upon the async dependency. Maybe you're just bored like I was. Anyway, I purposely copied the API of async.eachSeries to make it easy to understand how this works.

Once we remove the comments here, we have just 9 lines of code that can be reused for any array we want to process asynchronously! It will not modify the original array, errors can be sent to "short circuit" the iteration, and a separate callback can be used. It will also work on empty arrays. Quite a bit of functionality for just 9 lines :)

// void asyncForEach(Array arr, Function iterator, Function callback)
//   * iterator(item, done) - done can be called with an err to shortcut to callback
//   * callback(done)       - done recieves error if an iterator sent one
function asyncForEach(arr, iterator, callback) {

  // create a cloned queue of arr
  var queue = arr.slice(0);

  // create a recursive iterator
  function next(err) {

    // if there's an error, bubble to callback
    if (err) return callback(err);

    // if the queue is empty, call the callback with no error
    if (queue.length === 0) return callback(null);

    // call the callback with our task
    // we pass `next` here so the task can let us know when to move on to the next task
    iterator(queue.shift(), next);
  }

  // start the loop;
  next();
}

Now let's create a sample async function to use with it. We'll fake the delay with a setTimeout of 500 ms here.

// void sampleAsync(String uri, Function done)
//   * done receives message string after 500 ms
function sampleAsync(uri, done) {

  // fake delay of 500 ms
  setTimeout(function() {

    // our operation
    // <= "foo"
    // => "async foo !"
    var message = ["async", uri, "!"].join(" ");

    // call done with our result
    done(message);
  }, 500);
}

Ok, let's see how they work !

tasks = ["cat", "hat", "wat"];

asyncForEach(tasks, function(uri, done) {
  sampleAsync(uri, function(message) {
    console.log(message);
    done();
  });
}, function() {
  console.log("done");
});

Output (500 ms delay before each output)

async cat !
async hat !
async wat !
done
like image 25
Mulan Avatar answered Sep 27 '22 17:09

Mulan