Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl LWP::UserAgent hangs for 120 seconds seemingly randomly against a given server

I noticed a delay of approximately 120 seconds between handling the response_data and response_done events in WWW::Mechanize with a given https web site. I checked with a normal web browser and do not experience this slowness so I suspect there is something I must do wrong.

Here is what I did to trace the events (for some reason use LWP::Debug qw(+) did not do anything):

use WWW::Mechanize;
use Time::HiRes qw(gettimeofday);
use IO::Handle;

my $mech = WWW::Mechanize->new(
  timeout     => 3,
  autocheck   => 1,       # check success of each query
  stack_depth => 0,       # no keeping history
  keep_alive  => 50,      # connection pool
);

$mech->agent_alias( 'Windows IE 6' );
open my $debugfile, '>traffic.txt';
$debugfile->autoflush(1);

$mech->add_handler( request_send => sub {
    my $cur_time = gettimeofday();
    my $req = shift;
    print $debugfile "\n$cur_time === BEGIN HTTP REQUEST ===\n";
    print $debugfile $req->dump();
    print $debugfile "\n$cur_time ===   END HTTP REQUEST ===\n";
    return
  }
);
$mech->add_handler( response_header => sub {
    my $cur_time = gettimeofday();
    my $res = shift;
    print $debugfile "\n$cur_time === GOT RESPONSE HDRS ===\n";
    print $debugfile $res->dump();
    return
  }
);
$mech->add_handler( response_data => sub {
    my $cur_time = gettimeofday();
    my $res = shift;
    my $content_length = length($res->content);
    print $debugfile "$cur_time === Got response data chunk resp size = $content_length ===\n";
    return
  }
);
$mech->add_handler( response_done => sub {
    my $cur_time = gettimeofday();
    my $res = shift;
    print $debugfile "\n$cur_time === BEGIN HTTP RESPONSE ===\n";
    print $debugfile $res->dump();
    print $debugfile "\n===   END HTTP RESPONSE ===\n";
    return
  }
);

And here is a excerpt of the traces (URLs and cookies are obfuscated):

1347463214.24724 === BEGIN HTTP REQUEST ===
GET https://...
Accept-Encoding: gzip
Referer: https://...
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Cookie: ...
Cookie2: $Version="1"

(no content)

1347463214.24724 ===   END HTTP REQUEST ===

1347463216.13134 === GOT RESPONSE HDRS ===
HTTP/1.1 200 OK
Date: Wed, 12 Sep 2012 15:20:08 GMT
Accept-Ranges: bytes
...
Server: Lotus-Domino
Content-Length: 377806
Content-Type: application/octet-stream
Last-Modified: Fri, 07 Sep 2012 06:25:33 GMT
Client-Peer: ...
Client-Response-Num: 1
Client-SSL-Cert-Issuer: ...
Client-SSL-Cert-Subject: ...
Client-SSL-Cipher: DES-CBC3-SHA
Client-SSL-Socket-Class: IO::Socket::SSL

(no content)
1347463216.48305 === Got response data chunk resp size = 4096 ===

1347463337.98131 === BEGIN HTTP RESPONSE ===
HTTP/1.1 200 OK
Date: Wed, 12 Sep 2012 15:20:08 GMT
Accept-Ranges: bytes
...
Server: Lotus-Domino
Content-Length: 377806
Content-Type: application/octet-stream
Last-Modified: Fri, 07 Sep 2012 06:25:33 GMT
Client-Date: Wed, 12 Sep 2012 15:22:17 GMT
Client-Peer: ...
Client-Response-Num: 1
Client-SSL-Cert-Issuer: ...
Client-SSL-Cert-Subject: ...
Client-SSL-Cipher: DES-CBC3-SHA
Client-SSL-Socket-Class: IO::Socket::SSL

PK\3\4\24\0\6\0\10\0\0\0!\0\x88\xBC\21Xi\2\0\0\x84\22\0\0\23\0\10\2[Content_Types].xml \xA2...
(+ 377294 more bytes not shown)

===   END HTTP RESPONSE ===

During the “Got response data chunk” and “BEGIN HTTP RESPONSE” messages, you can see a 121.5 seconds gap. I have the feeling that sometimes LWP::UserAgent hangs for two minutes after having received the full amount of data.

Do you have any clue where that could come from?

EDIT here is a screenshot in Wireshark : I get the FIN/ACK message after 120 seconds…

Wireshark Excerpt

Thanks

like image 743
Benoit Avatar asked Jan 23 '26 20:01

Benoit


2 Answers

I think it is likely that your transaction is actually taking that long. The documentation for LWP::UserAgent says this

[The response_data handler] needs to return a TRUE value to be called again for subsequent chunks for the same request

so, because your handler returns nothing, you are tracing only the first returned data packet

According to your output the first 4KB of data arrives in 2.2 seconds, or roughly 2KB per second. The entire data is 369KB long so you would expect to receive 92 more data packets, and at 2KB per second it would take three minutes to transmit. You get your response in two minutes so I think your times are reasonable

like image 50
Borodin Avatar answered Jan 26 '26 15:01

Borodin


Thanks to Borodin's answer I found a way to fix it:

I modified the response_data event handler sub this way:

if($res->header('Content-Length') == length($res->content)) {
    die "OK"; # Got whole data, not waiting for server to end the communication channel.
}
return 1; # In other cases make sure the handler is called for subsequent chunks

And then if X-Died header is equal to OK then ignore errors in the caller.

like image 21
Benoit Avatar answered Jan 26 '26 15:01

Benoit