Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Limit Q promise concurrency in Node js

Is there any way to limit the number of concurrent Q promises to be executed at once in node js?

I am building a web scraper , which must request and parse more 3000+ pages and without throttle some of the requests i make aren't responded to on time, so the connection rests and the needed response (html code) becomes unavailable.

To counter act this, i found that limiting the number of requests my problem goes away.


I have tried the following methods but to no avail:

  • Concurrency limit in Q promises - node
  • How can I limit Q promise concurrency?
  • https://gist.github.com/gaearon/7930162
  • https://github.com/ForbesLindesay/throat

I need to request an array of urls, doing only 1 request at a time and when all urls in the array have completed, then return the results in a array.

function processWebsite() {
  //computed by this stage
  urls = [u1,u2,u3,u4,l5,u6,u7,u8,u9];

  var promises = throttle(urls,1,myfunction);

  // myfunction returns a Q promise and takes a considerable 
  // amount of time to resolve (approximately 2-5 minutes)
  
  Q.all(promises).then(function(results){
      //work with the results of the promises array
  });
}
like image 960
user3438286 Avatar asked Nov 18 '14 11:11

user3438286


People also ask

How do you handle concurrency in node JS?

Node js uses an event loop to maintain concurrency and perform non-blocking I/O operations. As soon as Node js starts, it initializes an event loop. The event loop works on a queue (which is called an event queue) and performs tasks in FIFO(First In First Out) order.

How many promises can node js handle?

Assuming we have the processing power and that our promises can run in parallel, there is a hard limit of just over 2 million promises.

Is Promise all multithreaded?

Often Promise. all() is thought of as running in parallel, but this isn't the case. Parallel means that you do many things at the same time on multiple threads. However, Javascript is single threaded with one call stack and one memory heap.

Does Promise allSettled run in parallel?

allSettled(promises) is a helper function that runs promises in parallel and aggregates the settled statuses (either fulfilled or rejected) into a result array.


2 Answers

I'd do this, which will iterate over each URL, building a chain of promises that run when the previous one finishes, and resolves with an array of the request results.

return urls.reduce(function(acc, url){
    return acc.then(function(results)
        return myfunction(url).then(function(requestResult){
             return results.concat(requestResult)
        });
    });
}, Q.resolve([]));

You could turn that into a helper too:

var results = map(urls, myfunction);

function map(items, fn){
    return items.reduce(function(acc, item){
        return acc.then(function(results)
            return fn(item).then(function(result){
                 return results.concat(result)
            });
        });
    }, Q.resolve([])
}

Note, the bluebird promise library has a helper to simplify this kind of thing.

return Bluebird.map(urls, myfunction, {concurrency: 1});
like image 129
loganfsmyth Avatar answered Sep 24 '22 02:09

loganfsmyth


Here is my stab at making a throttled map function for Q.

function qMap(items, worker, concurrent) {
    var result = Q.defer();
    var work = [];
    var working = 0;
    var done = 0;

    concurrent = parseInt(concurrent, 10) || 1;

    function getNextIndex() {
        var i;
        for (i = 0; i < items.length; i++) {
            if (typeof work[i] === "undefined") return i;
        }
    }
    function doneWorking() {
        working--;
        done++;
        result.notify( +((100 * done / items.length).toFixed(1)) );
        if (!startWorking() && done === items.length) {
            result.resolve(work);
        }
    }
    function startWorking() {
        var index = getNextIndex();
        if (typeof index !== "undefined" && working < concurrent) {
            working++;
            work[index] = worker(items[index]).finally(doneWorking);
            return true;
        }
    }
    while (startWorking());
    return result.promise;
}

It accepts

  • an array of items to work on (URLs, in your case),
  • a worker (which must be a function that accepts an item and returns a promise)
  • and a maximum value of concurrent items to work on at any given time.

It returns

  • a promise and
  • resolves to an array of settled promises when all workers have finished.

It does not fail, you must inspect the individual promises to determine the overall state of the operation.

In your case you would use it like that, for example with 15 concurrent requests:

// myfunction returns a Q promise and takes a considerable 
// amount of time to resolve (approximately 2-5 minutes)

qMap(urls, myfunction, 15)
.progress(function (percentDone) {
    console.log("progress: " + percentDone);
})
.done(function (urlPromises) {
    console.log("all done: " + urlPromises);
});
like image 23
Tomalak Avatar answered Sep 24 '22 02:09

Tomalak