I have a list of urls (1000+) which have been stored for over a year now. I want to run through and verify them all to see if they still exist. What is the best / quickest way to check them all and return a list of ones which do not return a site?
To check whether the string entered is a valid URL or not we use the validators module in Python. When we pass the string to the method url() present in the module it returns true(if the string is URL) and ValidationFailure(func=url, …) if URL is invalid.
Visit Website Planet. Enter the URL of your website address on the field and press the Check button. Website Planet will show whether your website is online or not.
this is kind of slow but you can use something like this to check if url is a live
import urllib2
try:
urllib2.urlopen(url)
return True # URL Exist
except ValueError, ex:
return False # URL not well formatted
except urllib2.URLError, ex:
return False # URL don't seem to be alive
more quick than urllib2 you can use httplib
import httplib
try:
a = httplib.HTTPConnection('google.com')
a.connect()
except httplib.HTTPException as ex:
print "not connected"
you can also do a DNS checkout (it's not very convenient to check if a website don't exist):
import socket
try:
socket.gethostbyname('www.google.com')
except socket.gaierror as ex:
print "not existe"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With