Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

print All the keys of a json file in python

Tags:

python

json

key

I have a folder where I have around 20000 JSON files. I want to find out all the unique keys of each JSON and I want take an union of all the keys. However, I got stuck in the initial step only. I am able to find the keys of a single JSON file.

I have wrote the following code till now:

from pprint import pprint
import json
json_data=open("/Users/akira/out/1.json")
jdata = json.load(json_data)

for key, value in jdata:
   pprint("Key:")
   pprint(key)

It is giving me an error as follows:

Traceback (most recent call last):
 File "/Users/akira/PycharmProjects/csci572/linkedbased.py",     line 8, in <module>
   for key, value in jdata:
 ValueError: need more than 1 value to unpack

My JSON is a nested json. Please suggest me how can I get all the keys.

{
"a": "Offer",
"inLanguage": "et",
"availabl": {
    "a": "Place",
    "address": {
        "a": "PostalAddress",
        "name": "Oklahoma"
    }
},
"description": "Smith and Wesson 686 357 magnum 6 inch barrel wood handle great condition shoots great.",
"priceCurrency": "USD",
"geonames_address": [
    {
        "a": "PopulatedPlace",
        "hasIdentifier": {
            "a": "Identifier",
            "label": "4552707",
            "hasType": "http://dig.isi.edu/gazetteer/data/SKOS/IdentifierTypes/GeonamesId"
        },
        "hasPreferredName": {
            "a": "Name",
            "label": "Tahlequah"
        },
        "uri": "http://dig.isi.edu/gazetteer/data/geonames/place/4552707",
        "fallsWithinState1stDiv": {
            "a": "State1stDiv",
            "uri": "http://dig.isi.edu/gazetteer/data/geonames/place/State1stDiv/US_OK",
            "hasName": {
                "a": "Name",
                "label": "Oklahoma"
            }
        },
        "score": 0.5,
        "fallsWithinCountry": {
            "a": "Country",
            "uri": "http://dig.isi.edu/gazetteer/data/geonames/place/Country/US",
            "hasName": {
                "a": "Name",
                "label": "United States"
            }
        },
        "fallsWithinCountyProvince2ndDiv": {
            "a": "CountyProvince2ndDiv",
            "uri": "http://dig.isi.edu/gazetteer/data/geonames/place/CountyProvince2ndDiv/US_OK_021"
        },
        "geo": {
            "lat": 35.91537,
            "lon": -94.96996
        }
    }
],
"price": 750,
"title": "For Sale: Smith &amp; Wesson 686",
"publisher": {
    "a": "Organization",
    "name": "armslist.com",
    "uri": "http://dig.isi.edu/weapons/data/organization/armslist"
},
"uri": "http://dig.isi.edu/weapons/data/page/13AD9516F01012C5F89E8AADAE5D7E1E2BA97FF9/1433463841000/processed",
"seller": {
    "a": "PersonOrOrganization",
    "description": "Private Party"
} //, ...
}
like image 341
akira Avatar asked Nov 01 '15 19:11

akira


2 Answers

Instead of for key, value in jdata:, use for key, value in jdata.items(): like this:

for key, value in data.items():
    pprint("Key:")
    pprint(key)

Take a look at the docs for dict:

items():

Return a new view of the dictionary’s items ((key, value) pairs).

EDIT: If you want to get all of the nested keys and not just the top level ones, you could take an approach like those suggested in another answer like so:

def get_keys(dl, keys_list):
    if isinstance(dl, dict):
        keys_list += dl.keys()
        map(lambda x: get_keys(x, keys_list), dl.values())
    elif isinstance(dl, list):
        map(lambda x: get_keys(x, keys_list), dl)

keys = []
get_keys(jdata, keys)

print(keys)
# [u'a', u'inLanguage', u'description', u'priceCurrency', u'geonames_address', u'price', u'title', u'availabl', u'uri', u'seller', u'publisher', u'a', u'hasIdentifier', u'hasPreferredName', u'uri', u'fallsWithinState1stDiv', u'score', u'fallsWithinCountry', u'fallsWithinCountyProvince2ndDiv', u'geo', u'a', u'hasType', u'label', u'a', u'label', u'a', u'uri', u'hasName', u'a', u'label', u'a', u'uri', u'hasName', u'a', u'label', u'a', u'uri', u'lat', u'lon', u'a', u'address', u'a', u'name', u'a', u'description', u'a', u'name', usury']

print(list(set(keys)))    # unique list of keys
# [u'inLanguage', u'fallsWithinState1stDiv', u'label', u'hasName', u'title', u'hasPreferredName', u'lon', u'seller', u'score', u'description', u'price', u'address', u'lat', u'fallsWithinCountyProvince2ndDiv', u'geo', u'a', u'publisher', u'hasIdentifier', u'name', u'priceCurrency', u'geonames_address', u'hasType', u'availabl', u'uri', u'fallsWithinCountry']
like image 194
Mike Covington Avatar answered Oct 12 '22 11:10

Mike Covington


You should use either dict.items() or dict.iteritems() in for key, value in jdata

So, it should be either

for key, value in jdata.items():

OR

for key, value in jdata.iteritems():

for python3 and python2 respectively.

See answers on this question to know the difference between the two: What is the difference between dict.items() and dict.iteritems()?

If you only need to iterate over keys of the dictionary, you can even try dict.keys() or dict.iterkeys()

like image 24
Vipul Avatar answered Oct 12 '22 10:10

Vipul