I have a folder where I have around 20000 JSON files. I want to find out all the unique keys of each JSON and I want take an union of all the keys. However, I got stuck in the initial step only. I am able to find the keys of a single JSON file.
I have wrote the following code till now:
from pprint import pprint
import json
json_data=open("/Users/akira/out/1.json")
jdata = json.load(json_data)
for key, value in jdata:
pprint("Key:")
pprint(key)
It is giving me an error as follows:
Traceback (most recent call last):
File "/Users/akira/PycharmProjects/csci572/linkedbased.py", line 8, in <module>
for key, value in jdata:
ValueError: need more than 1 value to unpack
My JSON is a nested json. Please suggest me how can I get all the keys.
{
"a": "Offer",
"inLanguage": "et",
"availabl": {
"a": "Place",
"address": {
"a": "PostalAddress",
"name": "Oklahoma"
}
},
"description": "Smith and Wesson 686 357 magnum 6 inch barrel wood handle great condition shoots great.",
"priceCurrency": "USD",
"geonames_address": [
{
"a": "PopulatedPlace",
"hasIdentifier": {
"a": "Identifier",
"label": "4552707",
"hasType": "http://dig.isi.edu/gazetteer/data/SKOS/IdentifierTypes/GeonamesId"
},
"hasPreferredName": {
"a": "Name",
"label": "Tahlequah"
},
"uri": "http://dig.isi.edu/gazetteer/data/geonames/place/4552707",
"fallsWithinState1stDiv": {
"a": "State1stDiv",
"uri": "http://dig.isi.edu/gazetteer/data/geonames/place/State1stDiv/US_OK",
"hasName": {
"a": "Name",
"label": "Oklahoma"
}
},
"score": 0.5,
"fallsWithinCountry": {
"a": "Country",
"uri": "http://dig.isi.edu/gazetteer/data/geonames/place/Country/US",
"hasName": {
"a": "Name",
"label": "United States"
}
},
"fallsWithinCountyProvince2ndDiv": {
"a": "CountyProvince2ndDiv",
"uri": "http://dig.isi.edu/gazetteer/data/geonames/place/CountyProvince2ndDiv/US_OK_021"
},
"geo": {
"lat": 35.91537,
"lon": -94.96996
}
}
],
"price": 750,
"title": "For Sale: Smith & Wesson 686",
"publisher": {
"a": "Organization",
"name": "armslist.com",
"uri": "http://dig.isi.edu/weapons/data/organization/armslist"
},
"uri": "http://dig.isi.edu/weapons/data/page/13AD9516F01012C5F89E8AADAE5D7E1E2BA97FF9/1433463841000/processed",
"seller": {
"a": "PersonOrOrganization",
"description": "Private Party"
} //, ...
}
Instead of for key, value in jdata:
, use for key, value in jdata.items():
like this:
for key, value in data.items():
pprint("Key:")
pprint(key)
Take a look at the docs for dict:
items():
Return a new view of the dictionary’s items ((key, value) pairs).
EDIT: If you want to get all of the nested keys and not just the top level ones, you could take an approach like those suggested in another answer like so:
def get_keys(dl, keys_list):
if isinstance(dl, dict):
keys_list += dl.keys()
map(lambda x: get_keys(x, keys_list), dl.values())
elif isinstance(dl, list):
map(lambda x: get_keys(x, keys_list), dl)
keys = []
get_keys(jdata, keys)
print(keys)
# [u'a', u'inLanguage', u'description', u'priceCurrency', u'geonames_address', u'price', u'title', u'availabl', u'uri', u'seller', u'publisher', u'a', u'hasIdentifier', u'hasPreferredName', u'uri', u'fallsWithinState1stDiv', u'score', u'fallsWithinCountry', u'fallsWithinCountyProvince2ndDiv', u'geo', u'a', u'hasType', u'label', u'a', u'label', u'a', u'uri', u'hasName', u'a', u'label', u'a', u'uri', u'hasName', u'a', u'label', u'a', u'uri', u'lat', u'lon', u'a', u'address', u'a', u'name', u'a', u'description', u'a', u'name', usury']
print(list(set(keys))) # unique list of keys
# [u'inLanguage', u'fallsWithinState1stDiv', u'label', u'hasName', u'title', u'hasPreferredName', u'lon', u'seller', u'score', u'description', u'price', u'address', u'lat', u'fallsWithinCountyProvince2ndDiv', u'geo', u'a', u'publisher', u'hasIdentifier', u'name', u'priceCurrency', u'geonames_address', u'hasType', u'availabl', u'uri', u'fallsWithinCountry']
You should use either dict.items()
or dict.iteritems()
in for key, value in jdata
So, it should be either
for key, value in jdata.items():
OR
for key, value in jdata.iteritems():
for python3 and python2 respectively.
See answers on this question to know the difference between the two: What is the difference between dict.items() and dict.iteritems()?
If you only need to iterate over keys of the dictionary, you can even try dict.keys()
or dict.iterkeys()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With