Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to work with user-agent to download a webpage using Wget

Tags:

wget

I am trying to download this page using Wget. Here is the page link:

http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&rt=nc&item=250972882769&si=a8iGAIchyvEbn7KveYFZ5QbEE7o%3D&print=all&category=31387

And here is my cmd:

wget -O ebay.html --user-agent="Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1" "http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&rt=nc&item=250972882769&si=a8iGAIchyvEbn7KveYFZ5QbEE7o%3D&print=all&category=31387"

When I use it to access the page using a browser it works fine. When I use Wget, it downloads another page, not the original one. I think the problem is for user-agent. What's the solution?

like image 842
Quazi Marufur Rahman Avatar asked Jan 15 '12 18:01

Quazi Marufur Rahman


People also ask

What is wget user agent?

“User-Agent” is a header field that the browser sends to the server it wants to access. Therefore, to download from a server that is refusing to connect, try to modify the user agent. Find a database of all user agents online, search for the one you need and run the command: wget --user-agent="User Agent Here" "[URL]"

What is wget command used for?

Wget is a free command-line utility and network file downloader, which comes with many features that make file downloads easy, including: Download large files or mirror complete web or FTP sites. Download multiple files at once. Set bandwidth and speed limit for downloads.


1 Answers

The problem isn't user-agent, it's a missing cookie or cookies. The solution is

  1. Retrieve the normal product page with wget --save-cookies=ebay-cookies ,
  2. Fish the "Print" link URL out of that HTML file. (I did this by hand, you should obviously write a script to do it.)
  3. Retrieve the "Print" URL with wget --load-cookies=ebay-cookies

I tried it with a random product page; it worked.

like image 58
Kyle Jones Avatar answered Sep 25 '22 13:09

Kyle Jones