Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get JSON from webpage into Python script

Tags:

python

json

People also ask

How can I get JSON data from a website?

The first step in this process is to choose a web scraper for your project. We obviously recommend ParseHub. Not only is it free to use, but it also works with all kinds of websites. With ParseHub, web scraping is as simple as clicking on the data you want and downloading it as an excel sheet or JSON file.

How do I import JSON data into Python?

It's pretty easy to load a JSON object in Python. Python has a built-in package called json, which can be used to work with JSON data. It's done by using the JSON module, which provides us with a lot of methods which among loads() and load() methods are gonna help us to read the JSON file.

How do I extract data from a JSON file in Python?

So first thing you need to import the 'json' module into the file. Then create a simple json object string in python and assign it to a variable. Now we will use the loads() function from 'json' module to load the json data from the variable. We store the json data as a string in python with quotes notation.


Get data from the URL and then call json.loads e.g.

Python3 example:

import urllib.request, json 
with urllib.request.urlopen("http://maps.googleapis.com/maps/api/geocode/json?address=google") as url:
    data = json.loads(url.read().decode())
    print(data)

Python2 example:

import urllib, json
url = "http://maps.googleapis.com/maps/api/geocode/json?address=google"
response = urllib.urlopen(url)
data = json.loads(response.read())
print data

The output would result in something like this:

{
"results" : [
    {
    "address_components" : [
        {
            "long_name" : "Charleston and Huff",
            "short_name" : "Charleston and Huff",
            "types" : [ "establishment", "point_of_interest" ]
        },
        {
            "long_name" : "Mountain View",
            "short_name" : "Mountain View",
            "types" : [ "locality", "political" ]
        },
        {
...

I'll take a guess that you actually want to get data from the URL:

jsonurl = urlopen(url)
text = json.loads(jsonurl.read()) # <-- read from it

Or, check out JSON decoder in the requests library.

import requests
r = requests.get('someurl')
print r.json() # if response type was set to JSON, then you'll automatically have a JSON response here...

This gets a dictionary in JSON format from a webpage with Python 2.X and Python 3.X:

#!/usr/bin/env python

try:
    # For Python 3.0 and later
    from urllib.request import urlopen
except ImportError:
    # Fall back to Python 2's urllib2
    from urllib2 import urlopen

import json


def get_jsonparsed_data(url):
    """
    Receive the content of ``url``, parse it as JSON and return the object.

    Parameters
    ----------
    url : str

    Returns
    -------
    dict
    """
    response = urlopen(url)
    data = response.read().decode("utf-8")
    return json.loads(data)


url = ("http://maps.googleapis.com/maps/api/geocode/json?"
       "address=googleplex&sensor=false")
print(get_jsonparsed_data(url))

See also: Read and write example for JSON


I have found this to be the easiest and most efficient way to get JSON from a webpage when using Python 3:

import json,urllib.request
data = urllib.request.urlopen("https://api.github.com/users?since=100").read()
output = json.loads(data)
print (output)

you need import requests and use from json() method :

source = requests.get("url").json()
print(source)

Of course, this method also works:

import json,urllib.request
data = urllib.request.urlopen("url").read()
output = json.loads(data)
print (output)

json.loads will decode it into a Python object using this table, for example a JSON object will become a Python dict.


All that the call to urlopen() does (according to the docs) is return a file-like object. Once you have that, you need to call its read() method to actually pull the JSON data across the network.

Something like:

jsonurl = urlopen(url)

text = json.loads(jsonurl.read())
print text

In Python 2, json.load() will work instead of json.loads()

import json
import urllib

url = 'https://api.github.com/users?since=100'
output = json.load(urllib.urlopen(url))
print(output)

Unfortunately, that doesn't work in Python 3. json.load is just a wrapper around json.loads that calls read() for a file-like object. json.loads requires a string object and the output of urllib.urlopen(url).read() is a bytes object. So one has to get the file encoding in order to make it work in Python 3.

In this example we query the headers for the encoding and fall back to utf-8 if we don't get one. The headers object is different between Python 2 and 3 so it has to be done different ways. Using requests would avoid all this, but sometimes you need to stick to the standard library.

import json
from six.moves.urllib.request import urlopen

DEFAULT_ENCODING = 'utf-8'
url = 'https://api.github.com/users?since=100'
urlResponse = urlopen(url)

if hasattr(urlResponse.headers, 'get_content_charset'):
    encoding = urlResponse.headers.get_content_charset(DEFAULT_ENCODING)
else:
    encoding = urlResponse.headers.getparam('charset') or DEFAULT_ENCODING

output = json.loads(urlResponse.read().decode(encoding))
print(output)