I often use pandas groupby to generate stacked tables. But then I often want to output the resulting nested relations to json. Is there any way to extract a nested json filed from the stacked table it produces?
Let's say I have a df like:
year office candidate amount
2010 mayor joe smith 100.00
2010 mayor jay gould 12.00
2010 govnr pati mara 500.00
2010 govnr jess rapp 50.00
2010 govnr jess rapp 30.00
I can do:
grouped = df.groupby('year', 'office', 'candidate').sum()
print grouped
amount
year office candidate
2010 mayor joe smith 100
jay gould 12
govnr pati mara 500
jess rapp 80
Beautiful! Of course, what I'd real like to do is get nested json via a command along the lines of grouped.to_json. But that feature isn't available. Any workarounds?
So, what I really want is something like:
{"2010": {"mayor": [
{"joe smith": 100},
{"jay gould": 12}
]
},
{"govnr": [
{"pati mara":500},
{"jess rapp": 80}
]
}
}
Don
I don't think think there is anything built-in to pandas to create a nested dictionary of the data. Below is some code that should work in general for a series with a MultiIndex, using a defaultdict
The nesting code iterates through each level of the MultIndex, adding layers to the dictionary until the deepest layer is assigned to the Series value.
In [99]: from collections import defaultdict
In [100]: results = defaultdict(lambda: defaultdict(dict))
In [101]: for index, value in grouped.itertuples():
...: for i, key in enumerate(index):
...: if i == 0:
...: nested = results[key]
...: elif i == len(index) - 1:
...: nested[key] = value
...: else:
...: nested = nested[key]
In [102]: results
Out[102]: defaultdict(<function <lambda> at 0x7ff17c76d1b8>, {2010: defaultdict(<type 'dict'>, {'govnr': {'pati mara': 500.0, 'jess rapp': 80.0}, 'mayor': {'joe smith': 100.0, 'jay gould': 12.0}})})
In [106]: print json.dumps(results, indent=4)
{
"2010": {
"govnr": {
"pati mara": 500.0,
"jess rapp": 80.0
},
"mayor": {
"joe smith": 100.0,
"jay gould": 12.0
}
}
}
I had a look at the solution above and figured out that it only works for 3 levels of nesting. This solution will work for any number of levels.
import json
levels = len(grouped.index.levels)
dicts = [{} for i in range(levels)]
last_index = None
for index,value in grouped.itertuples():
if not last_index:
last_index = index
for (ii,(i,j)) in enumerate(zip(index, last_index)):
if not i == j:
ii = levels - ii -1
dicts[:ii] = [{} for _ in dicts[:ii]]
break
for i, key in enumerate(reversed(index)):
dicts[i][key] = value
value = dicts[i]
last_index = index
result = json.dumps(dicts[-1])
Here is a generic recursive solution for this problem:
def df_to_dict(df):
if df.ndim == 1:
return df.to_dict()
ret = {}
for key in df.index.get_level_values(0):
sub_df = df.xs(key)
ret[key] = df_to_dict(sub_df)
return ret
I'm aware this is an old question, but I came across the same issue recently. Here's my solution. I borrowed a lot of stuff from chrisb's example (Thank you!).
This has the advantage that you can pass a lambda to get the final value from whatever enumerable you want, as well as for each group.
from collections import defaultdict
def dict_from_enumerable(enumerable, final_value, *groups):
d = defaultdict(lambda: defaultdict(dict))
group_count = len(groups)
for item in enumerable:
nested = d
item_result = final_value(item) if callable(final_value) else item.get(final_value)
for i, group in enumerate(groups, start=1):
group_val = str(group(item) if callable(group) else item.get(group))
if i == group_count:
nested[group_val] = item_result
else:
nested = nested[group_val]
return d
In the question, you'd call this function like:
dict_from_enumerable(grouped.itertuples(), 'amount', 'year', 'office', 'candidate')
The first argument can be an array of data as well, not even requiring pandas.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With