I have a 4 column DataFrame
Subject_id Subject Time Score
Subject_1 Math Day 1
Subject_1 Math Night 2
Subject_1 Music Day 3
Subject_1 Music Night 4
Subject_2 Math Day 5
Subject_2 Math Night 6
Subject_2 Music Day 7
Subject_2 Music Night 8
I want to group this columns hierarchically and convert them into a dictionary as follows:
result = {
'Subject_1': {
'Math': {
'Day': 1,
'Night': 2
},
'Music': {
'Day': 3,
'Night': 4
}
}
'Subject_2': {
'Math': {
'Day': 5,
'Night': 6
},
'Music': {
'Day': 7,
'Night': 8
}
}
}
I managed to use pivot
with one column less and get the desired result
df.pivot('Subject_id', 'Subject', 'Score').to_dict('index')
But if I try one more column (one level deeper dictionary)
df.pivot('Subject_id', 'Subject', 'Time', 'Score').to_dict('index')
I get the error:
TypeError: pivot() takes at most 4 arguments (5 given)
I have similarly tried using groupby
with a lambda function with 3 columns:
df.groupby('Subject_id')
.apply(lambda x: dict(zip(x['Subject'],x['Score'])))
.to_dict()
But I cannot get the desired result with 4 columns.
Is there a way I can give an arbitrary number of columns and convert them into a hierarchical dictionary?
Like grouping by several fields in a specific order of hierarchy.
Here's one way
In [86]: {k: g.pivot('Subject', 'Time', 'Score').to_dict('index')
for k, g in df.groupby('Subject_id')}
Out[86]:
{'Subject_1': {'Math': {'Day': 1, 'Night': 2},
'Music': {'Day': 3, 'Night': 4}},
'Subject_2': {'Math': {'Day': 5, 'Night': 6},
'Music': {'Day': 7, 'Night': 8}}}
defaultdict
approach.
def rec_dd():
return defaultdict(rec_dd)
dd = rec_dd() # defaultdict for arbitrary depth
tuple_d = df.set_index(['Subject_id', 'Subject', 'Time']).to_dict()["Score"]
for k, v in tuple_d.items():
dd[k[0]][k[1]][k[2]] = v
defaultdict(<function __main__.rec_dd>,
{'Subject_1': defaultdict(<function __main__.rec_dd>,
{'Math': defaultdict(<function __main__.rec_dd>,
{'Day': 1, 'Night': 2}),
'Music': defaultdict(<function __main__.rec_dd>,
{'Day': 3, 'Night': 4})}),
'Subject_2': defaultdict(<function __main__.rec_dd>,
{'Math': defaultdict(<function __main__.rec_dd>,
{'Day': 5, 'Night': 6}),
'Music': defaultdict(<function __main__.rec_dd>,
{'Day': 7, 'Night': 8})})})
The method rec_dd
is taken from @AndrewClark's answer in defaultdict of defaultdict, nested
If you don't want a defaultdict
, you can try the following
import json
d = json.loads(json.dumps(dd))
{'Subject_1': {'Math': {'Day': 1, 'Night': 2},
'Music': {'Day': 3, 'Night': 4}},
'Subject_2': {'Math': {'Day': 5, 'Night': 6},
'Music': {'Day': 7, 'Night': 8}}}
The method to turn a defaultdict
into dict
is taken from @Meow's answer in Python: convert defaultdict to dict
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With