I need to parse a json file which unfortunately for me, does not follow the prototype. I have two issues with the data, but i've already found a workaround for it so i'll just mention it at the end, maybe someone can help there as well.
So i need to parse entries like this:
"Test":{
"entry":{
"Type":"Something"
},
"entry":{
"Type":"Something_Else"
}
}, ...
The json default parser updates the dictionary and therfore uses only the last entry. I HAVE to somehow store the other one as well, and i have no idea how to do this. I also HAVE to store the keys in the several dictionaries in the same order they appear in the file, thats why i am using an OrderedDict to do so. it works fine, so if there is any way to expand this with the duplicate entries i'd be grateful.
My second issue is that this very same json file contains entries like that:
"Test":{
{
"Type":"Something"
}
}
Json.load() function raises an exception when it reaches that line in the json file. The only way i worked around this was to manually remove the inner brackets myself.
Thanks in advance
You can use JSONDecoder.object_pairs_hook
to customize how JSONDecoder
decodes objects. This hook function will be passed a list of (key, value)
pairs that you usually do some processing on, and then turn into a dict
.
However, since Python dictionaries don't allow for duplicate keys (and you simply can't change that), you can return the pairs unchanged in the hook and get a nested list of (key, value)
pairs when you decode your JSON:
from json import JSONDecoder
def parse_object_pairs(pairs):
return pairs
data = """
{"foo": {"baz": 42}, "foo": 7}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
[(u'foo', [(u'baz', 42)]), (u'foo', 7)]
How you use this data structure is up to you. As stated above, Python dictionaries won't allow for duplicate keys, and there's no way around that. How would you even do a lookup based on a key? dct[key]
would be ambiguous.
So you can either implement your own logic to handle a lookup the way you expect it to work, or implement some sort of collision avoidance to make keys unique if they're not, and then create a dictionary from your nested list.
Edit: Since you said you would like to modify the duplicate key to make it unique, here's how you'd do that:
from collections import OrderedDict
from json import JSONDecoder
def make_unique(key, dct):
counter = 0
unique_key = key
while unique_key in dct:
counter += 1
unique_key = '{}_{}'.format(key, counter)
return unique_key
def parse_object_pairs(pairs):
dct = OrderedDict()
for key, value in pairs:
if key in dct:
key = make_unique(key, dct)
dct[key] = value
return dct
data = """
{"foo": {"baz": 42, "baz": 77}, "foo": 7, "foo": 23}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
OrderedDict([(u'foo', OrderedDict([(u'baz', 42), ('baz_1', 77)])), ('foo_1', 7), ('foo_2', 23)])
The make_unique
function is responsible for returning a collision-free key. In this example it just suffixes the key with _n
where n
is an incremental counter - just adapt it to your needs.
Because the object_pairs_hook
receives the pairs exactly in the order they appear in the JSON document, it's also possible to preserve that order by using an OrderedDict
, I included that as well.
Thanks a lot @Lukas Graf, i got it working as well by implementing my own version of the hook function
def dict_raise_on_duplicates(ordered_pairs):
count=0
d=collections.OrderedDict()
for k,v in ordered_pairs:
if k in d:
d[k+'_dupl_'+str(count)]=v
count+=1
else:
d[k]=v
return d
Only thing remaining is to automatically get rid of the double brackets and i am done :D Thanks again
If you would prefer to convert those duplicated keys into an array, instead of having separate copies, this could do the work:
def dict_raise_on_duplicates(ordered_pairs):
"""Convert duplicate keys to JSON array."""
d = {}
for k, v in ordered_pairs:
if k in d:
if type(d[k]) is list:
d[k].append(v)
else:
d[k] = [d[k],v]
else:
d[k] = v
return d
And then you just use:
dict = json.loads(yourString, object_pairs_hook=dict_raise_on_duplicates)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With