Python check if website exists

Tags:

I wanted to check if a certain website exists, this is what I'm doing:

user_agent = 'Mozilla/20.0.1 (compatible; MSIE 5.5; Windows NT)' headers = { 'User-Agent':user_agent } link = "http://www.abc.com" req = urllib2.Request(link, headers = headers) page = urllib2.urlopen(req).read() - ERROR 402 generated here!

If the page doesn't exist (error 402, or whatever other errors), what can I do in the page = ... line to make sure that the page I'm reading does exit?

499

asked May 27 '13 18:05

James Hallen

2 Answers

You can use HEAD request instead of GET. It will only download the header, but not the content. Then you can check the response status from the headers.

For python 2.7.x, you can use httplib:

Click to copy

import httplib c = httplib.HTTPConnection('www.example.com') c.request("HEAD", '') if c.getresponse().status == 200:    print('web site exists')

or urllib2:

Click to copy

import urllib2 try:     urllib2.urlopen('http://www.example.com/some_page') except urllib2.HTTPError, e:     print(e.code) except urllib2.URLError, e:     print(e.args)

or for 2.7 and 3.x, you can install requests

Click to copy

import requests response = requests.get('http://www.example.com') if response.status_code == 200:     print('Web site exists') else:     print('Web site does not exist')

answered Sep 17 '22 12:09

Adem Öztaş

It's better to check that status code is < 400, like it was done here. Here is what do status codes mean (taken from wikipedia):

1xx - informational
2xx - success
3xx - redirection
4xx - client error
5xx - server error

If you want to check if page exists and don't want to download the whole page, you should use Head Request:

Click to copy

import httplib2 h = httplib2.Http() resp = h.request("http://www.google.com", 'HEAD') assert int(resp[0]['status']) < 400

taken from this answer.

If you want to download the whole page, just make a normal request and check the status code. Example using requests:

Click to copy

import requests  response = requests.get('http://google.com') assert response.status_code < 400

alecxe

Related questions
                            
                                How to measure server response time for Python requests POST-request
                            
                                Calling static method in python
                            
                                Change y range to start from 0 with matplotlib
                            
                                Python mock Patch os.environ and return value
                            
                                Matplotlib overlapping annotations / text
                            
                                How to export Keras .h5 to tensorflow .pb?
                            
                                How to print the sign + of a digit for positive numbers in Python
                            
                                How to implement server push in Flask framework?
                            
                                NameError: name 'List' is not defined
                            
                                How to join on multiple columns in Pyspark?
                            
                                Why is 'a' in ('abc') True while 'a' in ['abc'] is False?
                            
                                TextField missing in django.forms
                            
                                Can't open lib 'ODBC Driver 13 for SQL Server'? Sym linking issue?
                            
                                Docker-compose and pdb
                            
                                How to get more than 1000 objects from S3 by using list_objects_v2?
                            
                                Finding duplicate files and removing them
                            
                                How would you do the equivalent of preprocessor directives in Python?
                            
                                shuffle string in python
                            
                                TypeError: get() takes no keyword arguments
                            
                                How do I access (read, write) Google Sheets spreadsheets with Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python check if website exists

Tags:

python

html

urlopen

James Hallen

People also ask

2 Answers

Adem Öztaş

alecxe

Recent Activity

Donate For Us