Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Checking whether a link is dead or not using Python without downloading the webpage

Tags:

python

urllib2

For those who know wget, it has a option --spider, which allows one to check whether a link is broke or not, without actually downloading the webpage. I would like to do the same thing in Python. My problem is that I have a list of 100'000 links I want to check, at most once a day, and at least once a week. In any case this will generate a lot of unnecessary traffic.

As far as I understand from the urllib2.urlopen() documentation, it does not download the page but only the meta-information. Is this correct? Or is there some other way to do this in a nice manner?

Best,
Troels

like image 200
Troels Avatar asked Jul 12 '10 15:07

Troels


1 Answers

You should use the HEAD Request for this, it asks the webserver for the headers without the body. See How do you send a HEAD HTTP request in Python 2?

like image 69
Jochen Ritzel Avatar answered Nov 02 '22 10:11

Jochen Ritzel