Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sites not accepting wget user agent header

When I run this command:

wget --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0"  http://yahoo.com 

...I get this result (with nothing else in the file):

<!-- hw147.fp.gq1.yahoo.com uncompressed/chunked Wed Jun 19 03:42:44 UTC 2013 --> 

But when I run wget http://yahoo.com with no --user-agent option, I get the full page.

The user agent is the same header that my current browser sends. Why does this happen? Is there a way to make sure the user agent doesn't get blocked when using wget?

like image 360
Joe Mornin Avatar asked Jun 19 '13 03:06

Joe Mornin


People also ask

How do I pass a header in wget?

wget allows you to send an HTTP request with custom HTTP headers. To supply custom HTTP headers, use "--header" option. You can use "--header" option as many time as you want in a single run. If you would like to permanently set the default HTTP request header you want to use with wget, you can use ~/.

What is a wget user agent?

“User-Agent” is a header field that the browser sends to the server it wants to access. Therefore, to download from a server that is refusing to connect, try to modify the user agent. Find a database of all user agents online, search for the one you need and run the command: wget --user-agent="User Agent Here" "[URL]"

What is a user agent header?

The User-Agent request header is a characteristic string that lets servers and network peers identify the application, operating system, vendor, and/or version of the requesting user agent.

Why do all user agents start with Mozilla?

It spoofs Netscape by starting its User-Agent with Mozilla/ because web servers were routinely browser sniffing and serving pages with frames - a feature supported by both Netscape and IE, but not other browsers of the era - to Netscape only.


1 Answers

It seems Yahoo server does some heuristic based on User-Agent in a case Accept header is set to */*.

Accept: text/html

did the trick for me.

e.g.

wget  --header="Accept: text/html" --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0"  http://yahoo.com 

Note: if you don't declare Accept header then wget automatically adds Accept:*/* which means give me anything you have.

like image 155
Filip Avatar answered Oct 03 '22 12:10

Filip