Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I make 25 requests at a time with HTTP::Async in Perl?

I'm doing a lot of HTTP requests and I chose HTTP::Async to do the job. I've over 1000 requests to make, and if I simply do the following (see code below), a lot of requests time out by the time they get processed because it can take tens of minutes before processing gets to them:

for my $url (@urls) {
    $async->add(HTTP::Request->new(GET => $url));
}
while (my $resp = $async->wait_for_next_response) {
    # use $resp
}

So I decided to do 25 requests per time, but I can't think of a way to express it in code.

I tried the following:

while (1) {
    L25:
    for (1..25) {
        my $url = shift @urls;
        if (!defined($url)) {
            last L25;
        }
        $async->add(HTTP::Request->new(GET => $url));
    }
    while (my $resp = $async->wait_for_next_response) {
        # use $resp
    }
}

This however doesn't work well as because it's too slow now. Now it waits until all 25 requests have been processed until it adds another 25. So if it has 2 requests left, it does nothing. I've to wait for all requests to be processed to add the next batch of 25.

How could I improve this logic to make $async do something while I process records, but also make sure they don't time out.

like image 323
bodacydo Avatar asked Jun 23 '12 20:06

bodacydo


2 Answers

You're close, you just need to combine the two approaches! :-)

Untested, so think of it as pseudo code. In particular I am not sure if total_count is the right method to use, the documentation doesn't say. You could also just have an $active_requests counter that you ++ when adding a request and -- when you get a response.

while (1) {

   # if there aren't already 25 requests "active", then add more
   while (@urls and $async->total_count < 25) {
       my $url = shift @urls;
       $async->add( ... );
   }

   # deal with any finished requests right away, we wait for a
   # second just so we don't spin in the main loop too fast.
   while (my $response = $async->wait_for_next_response(1)) {
      # use $response
   }

   # finish the main loop when there's no more work
   last unless ($async->total_count or @urls);

}
like image 77
Ask Bjørn Hansen Avatar answered Nov 15 '22 05:11

Ask Bjørn Hansen


If you can't call wait_for_next_response fast enough because you're in the middle of executing other code, the simplest solution is to make the code interruptable by moving it to a separate thread of execution. But if you're going to start using threads, why use HTTP::Async?

use threads;
use Thread::Queue::Any 1.03;

use constant NUM_WORKERS => 25;

my $req_q = Thread::Queue::Any->new();
my $res_q = Thread::Queue::Any->new();

my @workers;
for (1..NUM_WORKERS) {
   push @workers, async {
      my $ua = LWP::UserAgent->new();
      while (my $req = $req_q->dequeue()) {
         $res_q->enqueue( $ua->request($req) );
      }
   };    
}

for my $url (@urls) {
   $req_q->enqueue( HTTP::Request->new( GET => $url ) );
}

$req_q->enqueue(undef) for @workers;

for (1..@urls) {
   my $res = $res_q->dequeue();
   ...
}

$_->join() for @workers;
like image 40
ikegami Avatar answered Nov 15 '22 03:11

ikegami