Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get a header with Python and convert in JSON (requests - urllib2 - json)

I’m trying to get the header from a website, encode it in JSON to write it to a file. I’ve tried two different ways without success.

FIRST with urllib2 and json

import urllib2
import json
host = ("https://www.python.org/")
header = urllib2.urlopen(host).info()
json_header = json.dumps(header)
print json_header

in this way I get the error:

TypeError: is not JSON serializable

So I try to bypass this issue by converting the object to a string -> json_header = str(header) In this way I can json_header = json.dumps(header) but the output it’s weird:

"Date: Wed, 02 Jul 2014 13:33:37 GMT\r\nServer: nginx\r\nContent-Type: text/html; charset=utf-8\r\nX-Frame-Options: SAMEORIGIN\r\nContent-Length: 45682\r\nAccept-Ranges: bytes\r\nVia: 1.1 varnish\r\nAge: 1263\r\nX-Served-By: cache-fra1220-FRA\r\nX-Cache: HIT\r\nX-Cache-Hits: 2\r\nVary: Cookie\r\nStrict-Transport-Security: max-age=63072000; includeSubDomains\r\nConnection: close\r\n"

SECOND with requests

import requests
r = requests.get(“https://www.python.org/”)
rh = r.headers
print rh

{'content-length': '45682', 'via': '1.1 varnish', 'x-cache': 'HIT', 'accept-ranges': 'bytes', 'strict-transport-security': 'max-age=63072000; includeSubDomains', 'vary': 'Cookie', 'server': 'nginx', 'x-served-by': 'cache-fra1226-FRA', 'x-cache-hits': '14', 'date': 'Wed, 02 Jul 2014 13:39:33 GMT', 'x-frame-options': 'SAMEORIGIN', 'content-type': 'text/html; charset=utf-8', 'age': '1619'}

In this way the output is more JSON like but still not OK (see the ‘ ‘ instead of “ “ and other stuff like the = and ;). Evidently there’s something (or a lot) I’m not doing in the right way. I’ve tried to read the documentation of the modules but I can’t understand how to solve this problem. Thank you for your help.

like image 921
The One Electronic Avatar asked Jul 02 '14 13:07

The One Electronic


People also ask

How get JSON data from GET request in Python?

json() – Python requests. response. json() returns a JSON object of the result (if the result was written in JSON format, if not it raises an error). Python requests are generally used to fetch the content from a particular resource URI.

How do you pass JSON data in a post request in Python?

To post a JSON to the server using Python Requests Library, call the requests. post() method and pass the target URL as the first parameter and the JSON data with the json= parameter. The json= parameter takes a dictionary and automatically converts it to a JSON string.

How do I convert a JSON string to a JSON file in Python?

Method 2: Writing JSON to a file in Python using json.dump() Another way of writing JSON to a file is by using json. dump() method The JSON package has the “dump” function which directly writes the dictionary to a file in the form of JSON, without needing to convert it into an actual JSON object.


2 Answers

If you are only interested in the header, make a head request. convert the CaseInsensitiveDict in a dict object and then convert it to json.

import requests
import json
r = requests.head('https://www.python.org/')
rh = dict(r.headers)
json.dumps(rh)
like image 188
salmanwahed Avatar answered Oct 21 '22 17:10

salmanwahed


There are more than a couple ways to encode headers as JSON, but my first thought would be to convert the headers attribute to an actual dictionary instead of accessing it as requests.structures.CaseInsensitiveDict

import requests, json
r = requests.get("https://www.python.org/")
rh = json.dumps(r.headers.__dict__['_store'])
print rh

{'content-length': ('content-length', '45474'), 'via': ('via', '1.1 varnish'), 'x-cache': ('x-cache', 'HIT'), 'accept-ranges': ('accept-ranges', 'bytes'), 'strict-transport-security': ('strict-transport-security', 'max-age=63072000; includeSubDomains'), 'vary': ('vary', 'Cookie'), 'server': ('server', 'nginx'), 'x-served-by': ('x-served-by', 'cache-iad2132-IAD'), 'x-cache-hits': ('x-cache-hits', '1'), 'date': ('date', 'Wed, 02 Jul 2014 14:13:37 GMT'), 'x-frame-options': ('x-frame-options', 'SAMEORIGIN'), 'content-type': ('content-type', 'text/html; charset=utf-8'), 'age': ('age', '1483')}

Depending on exactly what you want on the headers you can specifically access them after this, but this will give you all the information contained in the headers, if in a slightly different format.

If you prefer a different format, you can also convert your headers to a dictionary:

import requests, json
r = requests.get("https://www.python.org/")
print json.dumps(dict(r.headers))

{"content-length": "45682", "via": "1.1 varnish", "x-cache": "HIT", "accept-ranges": "bytes", "strict-transport-security": "max-age=63072000; includeSubDomains", "vary": "Cookie", "server": "nginx", "x-served-by": "cache-at50-ATL", "x-cache-hits": "5", "date": "Wed, 02 Jul 2014 14:08:15 GMT", "x-frame-options": "SAMEORIGIN", "content-type": "text/html; charset=utf-8", "age": "951"}

like image 41
Slater Victoroff Avatar answered Oct 21 '22 15:10

Slater Victoroff