Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Get Header information from URL

I've been searching all around for a Python 3.x code sample to get HTTP Header information.

Something as simple as get_headers equivalent in PHP cannot be found in Python easily. Or maybe I am not sure how to best wrap my head around it.

In essence, I would like to code something where I can see whether a URL exists or not

something in the line of

h = get_headers(url)
if(h[0] == 200)
{
   print("Bingo!")
}

So far, I tried

h = http.client.HTTPResponse('http://docs.python.org/')

But always got an error

like image 474
Adib Avatar asked Feb 19 '13 04:02

Adib


People also ask

How do I get the URL header in Python?

h = get_headers(url) if(h[0] == 200) { print("Bingo!") }

How do I get the header data in Python?

To pass HTTP headers into a GET request using the Python requests library, you can use the headers= parameter in the . get() function. The parameter accepts a Python dictionary of key-value pairs, where the key represents the header type and the value is the header value.

How do you print a header in Python?

columns to print column names in Python. We can use pandas. dataframe. columns variable to print the column tags or headers at ease.


2 Answers

To get an HTTP response code in python-3.x, use the urllib.request module:

>>> import urllib.request
>>> response =  urllib.request.urlopen(url)
>>> response.getcode()
200
>>> if response.getcode() == 200:
...     print('Bingo')
... 
Bingo

The returned HTTPResponse Object will give you access to all of the headers, as well. For example:

>>> response.getheader('Server')
'Apache/2.2.16 (Debian)'

If the call to urllib.request.urlopen() fails, an HTTPError Exception is raised. You can handle this to get the response code:

import urllib.request
try:
    response = urllib.request.urlopen(url)
    if response.getcode() == 200:
        print('Bingo')
    else:
        print('The response code was not 200, but: {}'.format(
            response.get_code()))
except urllib.error.HTTPError as e:
    print('''An error occurred: {}
The response code was {}'''.format(e, e.getcode()))
like image 144
Johnsyweb Avatar answered Sep 30 '22 15:09

Johnsyweb


For Python 2.x

urllib, urllib2 or httplib can be used here. However note, urllib and urllib2 uses httplib. Therefore, depending on whether you plan to do this check a lot (1000s of times), it would be better to use httplib. Additional documentation and examples are here.

Example code:

import httplib
try:
    h = httplib.HTTPConnection("www.google.com")
    h.connect()
except Exception as ex:
    print "Could not connect to page."

For Python 3.x

A similar story to urllib (or urllib2) and httplib from Python 2.x applies to the urllib2 and http.client libraries in Python 3.x. Again, http.client should be quicker. For more documentation and examples look here.

Example code:

import http.client

try:
    conn = http.client.HTTPConnection("www.google.com")
    conn.connect()    
except Exception as ex:
    print("Could not connect to page.")

and if you wanted to check the status codes you would need to replace

conn.connect()

with

conn.request("GET", "/index.html")  # Could also use "HEAD" instead of "GET".
res = conn.getresponse()
if res.status == 200 or res.status == 302:  # Specify codes here.
    print("Page Found!")

Note, in both examples, if you would like to catch the specific exception relating to when the URL doesn't exist, rather than all of them, catch the socket.gaierror exception instead (see the socket documentation).

like image 28
Akyidrian Avatar answered Sep 30 '22 16:09

Akyidrian