Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python check if website exists

I wanted to check if a certain website exists, this is what I'm doing:

user_agent = 'Mozilla/20.0.1 (compatible; MSIE 5.5; Windows NT)' headers = { 'User-Agent':user_agent } link = "http://www.abc.com" req = urllib2.Request(link, headers = headers) page = urllib2.urlopen(req).read() - ERROR 402 generated here! 

If the page doesn't exist (error 402, or whatever other errors), what can I do in the page = ... line to make sure that the page I'm reading does exit?

like image 499
James Hallen Avatar asked May 27 '13 18:05

James Hallen


People also ask

How can I check if a website exists?

There are a couple of ways to see if the website name you want is available. The easiest method is to type the domain name you're interested in into our search box and we'll tell you if the domain is available for registration. You can also search for the domain in Whois Lookup.


2 Answers

You can use HEAD request instead of GET. It will only download the header, but not the content. Then you can check the response status from the headers.

For python 2.7.x, you can use httplib:

import httplib c = httplib.HTTPConnection('www.example.com') c.request("HEAD", '') if c.getresponse().status == 200:    print('web site exists') 

or urllib2:

import urllib2 try:     urllib2.urlopen('http://www.example.com/some_page') except urllib2.HTTPError, e:     print(e.code) except urllib2.URLError, e:     print(e.args) 

or for 2.7 and 3.x, you can install requests

import requests response = requests.get('http://www.example.com') if response.status_code == 200:     print('Web site exists') else:     print('Web site does not exist')  
like image 50
Adem Öztaş Avatar answered Sep 17 '22 12:09

Adem Öztaş


It's better to check that status code is < 400, like it was done here. Here is what do status codes mean (taken from wikipedia):

  • 1xx - informational
  • 2xx - success
  • 3xx - redirection
  • 4xx - client error
  • 5xx - server error

If you want to check if page exists and don't want to download the whole page, you should use Head Request:

import httplib2 h = httplib2.Http() resp = h.request("http://www.google.com", 'HEAD') assert int(resp[0]['status']) < 400 

taken from this answer.

If you want to download the whole page, just make a normal request and check the status code. Example using requests:

import requests  response = requests.get('http://google.com') assert response.status_code < 400 

See also similar topics:

  • Python script to see if a web page exists without downloading the whole page?
  • Checking whether a link is dead or not using Python without downloading the webpage
  • How do you send a HEAD HTTP request in Python 2?
  • Making HTTP HEAD request with urllib2 from Python 2

Hope that helps.

like image 29
alecxe Avatar answered Sep 19 '22 12:09

alecxe