Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python verify url goes to a page

Tags:

python

I have a list of urls (1000+) which have been stored for over a year now. I want to run through and verify them all to see if they still exist. What is the best / quickest way to check them all and return a list of ones which do not return a site?

like image 647
John Avatar asked Oct 28 '10 09:10

John


People also ask

How do you verify a URL in Python?

To check whether the string entered is a valid URL or not we use the validators module in Python. When we pass the string to the method url() present in the module it returns true(if the string is URL) and ValidationFailure(func=url, …) if URL is invalid.

How do I know if a URL is reachable?

Visit Website Planet. Enter the URL of your website address on the field and press the Check button. Website Planet will show whether your website is online or not.


1 Answers

this is kind of slow but you can use something like this to check if url is a live

import urllib2

try:
    urllib2.urlopen(url)
    return True         # URL Exist
except ValueError, ex:
    return False        # URL not well formatted
except urllib2.URLError, ex:
    return False        # URL don't seem to be alive

more quick than urllib2 you can use httplib

import httplib

try:
    a = httplib.HTTPConnection('google.com')
    a.connect()
except httplib.HTTPException as ex:
    print "not connected"

you can also do a DNS checkout (it's not very convenient to check if a website don't exist):

import socket

try:
    socket.gethostbyname('www.google.com')
except socket.gaierror as ex:
    print "not existe"
like image 105
mouad Avatar answered Oct 26 '22 18:10

mouad