For those who know wget
, it has a option --spider
, which allows one to check whether a link is broke or not, without actually downloading the webpage. I would like to do the same thing in Python. My problem is that I have a list of 100'000 links I want to check, at most once a day, and at least once a week. In any case this will generate a lot of unnecessary traffic.
As far as I understand from the urllib2.urlopen()
documentation, it does not download the page but only the meta-information. Is this correct? Or is there some other way to do this in a nice manner?
Best,
Troels
You should use the HEAD Request for this, it asks the webserver for the headers without the body. See How do you send a HEAD HTTP request in Python 2?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With