Python - Get Header information from URL

Tags:

python-3.x

I've been searching all around for a Python 3.x code sample to get HTTP Header information.

Something as simple as get_headers equivalent in PHP cannot be found in Python easily. Or maybe I am not sure how to best wrap my head around it.

In essence, I would like to code something where I can see whether a URL exists or not

something in the line of

Click to copy

h = get_headers(url)
if(h[0] == 200)
{
   print("Bingo!")
}

So far, I tried

Click to copy

h = http.client.HTTPResponse('http://docs.python.org/')

But always got an error

474

asked Feb 19 '13 04:02

2 Answers

To get an HTTP response code in python-3.x, use the urllib.request module:

Click to copy

>>> import urllib.request
>>> response =  urllib.request.urlopen(url)
>>> response.getcode()
200
>>> if response.getcode() == 200:
...     print('Bingo')
... 
Bingo

The returned HTTPResponse Object will give you access to all of the headers, as well. For example:

Click to copy

>>> response.getheader('Server')
'Apache/2.2.16 (Debian)'

If the call to urllib.request.urlopen() fails, an HTTPError Exception is raised. You can handle this to get the response code:

Click to copy

import urllib.request
try:
    response = urllib.request.urlopen(url)
    if response.getcode() == 200:
        print('Bingo')
    else:
        print('The response code was not 200, but: {}'.format(
            response.get_code()))
except urllib.error.HTTPError as e:
    print('''An error occurred: {}
The response code was {}'''.format(e, e.getcode()))

144

answered Sep 30 '22 15:09

For Python 2.x

urllib, urllib2 or httplib can be used here. However note, urllib and urllib2 uses httplib. Therefore, depending on whether you plan to do this check a lot (1000s of times), it would be better to use httplib. Additional documentation and examples are here.

Example code:

Click to copy

import httplib
try:
    h = httplib.HTTPConnection("www.google.com")
    h.connect()
except Exception as ex:
    print "Could not connect to page."

For Python 3.x

A similar story to urllib (or urllib2) and httplib from Python 2.x applies to the urllib2 and http.client libraries in Python 3.x. Again, http.client should be quicker. For more documentation and examples look here.

Example code:

Click to copy

import http.client

try:
    conn = http.client.HTTPConnection("www.google.com")
    conn.connect()    
except Exception as ex:
    print("Could not connect to page.")

and if you wanted to check the status codes you would need to replace

Click to copy

conn.connect()

with

Click to copy

conn.request("GET", "/index.html")  # Could also use "HEAD" instead of "GET".
res = conn.getresponse()
if res.status == 200 or res.status == 302:  # Specify codes here.
    print("Page Found!")

Note, in both examples, if you would like to catch the specific exception relating to when the URL doesn't exist, rather than all of them, catch the socket.gaierror exception instead (see the socket documentation).

answered Sep 30 '22 16:09

Akyidrian

Related questions
                            
                                How to check dict.has_key(k,x) with 2 variables
                            
                                Python - returning from a Tkinter callback
                            
                                Python: Traceback codecs.charmap_decode(input,self.errors,decoding_table)[0]
                            
                                QtSingleApplication for PySide or PyQt
                            
                                Sort a list of tuples by value and then alphabetically
                            
                                How do I get at the contents of an iterator?
                            
                                POS tagging - NLTK thinks noun is adjective
                            
                                Regular expression in python: removing square brackets and parts of the phrase inside of the brackets
                            
                                convert integer to binary
                            
                                Comparing image in url to image in filesystem in python
                            
                                Django Template: how to dump object on page in full
                            
                                combine two lists into a dictionary if a pattern matches
                            
                                Displaying a grayscale Image
                            
                                how to set the QTableView header name in Pyqt4
                            
                                How do I modify the width of a TextCtrl in wxPython?
                            
                                Removing newline from a csv file
                            
                                How to set cache settings while using h5py high level interface?
                            
                                What is the dfifference between instance dict and class dict
                            
                                grouping radio buttons in PyQt
                            
                                Parsing \ in command line argument - python 2.7.3

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python - Get Header information from URL

Tags:

python

python-3.x

Adib

People also ask

2 Answers

Johnsyweb

For Python 2.x

For Python 3.x

Akyidrian

Recent Activity

Donate For Us