Fetch the content of url get stucks when using Cro or HTTP::UserAgent

Question

I want to get the content of https://translate.google.cn, however, Cro::HTTP::Client and HTTP::UserAgent just stucks, and WWW get the content, I don't know why. If I change the $url to https://perl6.org, all three modules works fine:

my $url = "https://translate.google.cn";
use Cro::HTTP::Client;
my $resp = await Cro::HTTP::Client.new(
    headers => [
       User-agent => 'Cro'
   ]
).get($url);
say await $resp.body-text();



use HTTP::UserAgent;
my $ua = HTTP::UserAgent.new;
$ua.timeout = 30;
my $response = $ua.get($url);

if $response.is-success {
    say $response.content;
} else {
    die $response.status-line;
}
)

use WWW;
say get($url)

Do I missed sonething? Thanks for suggestion for me.

ugexe · Accepted Answer

For me HTTP::UserAgent works and Cro::HTTP::Client gets stuck. If you wish to debug things further both modules have a debug option:

perl6 -MHTTP::UserAgent -e 'my $ua = HTTP::UserAgent.new(:debug); say $ua.get("https://translate.google.cn").content'

CRO_TRACE=1 perl6 -MCro::HTTP::Client -e 'my $ua = Cro::HTTP::Client.new(); say $ua.get("https://translate.google.cn").result.body-text.result'

WWW also works for me. It is surprising it works for you since it is backed by HTTP::UserAgent ( which does not work for you ). Here is its get method to show you how it uses HTTP::UserAgent:

sub get ($url, *%headers) is export(:DEFAULT, :extras) {
    CATCH { .fail }
    %headers<User-Agent> //= 'Rakudo WWW';
    with HTTP::UserAgent.new.get: $url, |%headers {
        .is-success or fail .&err;
        .decoded-content
    }
}

shalomb · Answer

This could be down to http2 on the problematic https sites. In fact what you are describing is pretty much what I raised in https://github.com/croservices/cro-http/issues/45.

A workaround until a fix is in is to try making requests using http/1.1

Cro::HTTP::Client.get('https://translate.google.cn', :http<1.1>);

Fetch the content of url get stucks when using Cro or HTTP::UserAgent

Tags:

web-scraping

raku

cro

chenyf

2 Answers

ugexe

shalomb

Recent Activity

Donate For Us

Fetch the content of url get stucks when using Cro or HTTP::UserAgent

Tags:

web-scraping

raku

cro

chenyf

2 Answers

ugexe

shalomb

Related questions

Recent Activity

Donate For Us