I have a 4 column DataFrame
Subject_id    Subject         Time  Score    
Subject_1        Math          Day      1     
Subject_1        Math        Night      2                           
Subject_1       Music          Day      3
Subject_1       Music        Night      4
Subject_2        Math          Day      5       
Subject_2        Math        Night      6                              
Subject_2       Music          Day      7
Subject_2       Music        Night      8
I want to group this columns hierarchically and convert them into a dictionary as follows:
result = {
    'Subject_1': {
        'Math': {
            'Day': 1,
            'Night': 2 
        },
        'Music': {
            'Day': 3,
            'Night': 4
        }
    }
    'Subject_2': {
        'Math': {
            'Day': 5,
            'Night': 6
        },
        'Music': {
            'Day': 7,
            'Night': 8
        }
    }
}
I managed to use pivot with one column less and get the desired result
df.pivot('Subject_id', 'Subject', 'Score').to_dict('index')
But if I try one more column (one level deeper dictionary)
df.pivot('Subject_id', 'Subject', 'Time', 'Score').to_dict('index')
I get the error:
TypeError: pivot() takes at most 4 arguments (5 given)
I have similarly tried using groupby with a lambda function with 3 columns:
df.groupby('Subject_id')
   .apply(lambda x: dict(zip(x['Subject'],x['Score'])))
   .to_dict()
But I cannot get the desired result with 4 columns.
Is there a way I can give an arbitrary number of columns and convert them into a hierarchical dictionary?
Like grouping by several fields in a specific order of hierarchy.
Here's one way
In [86]: {k: g.pivot('Subject', 'Time', 'Score').to_dict('index') 
          for k, g in df.groupby('Subject_id')}
Out[86]:
{'Subject_1': {'Math': {'Day': 1, 'Night': 2},
  'Music': {'Day': 3, 'Night': 4}},
 'Subject_2': {'Math': {'Day': 5, 'Night': 6},
  'Music': {'Day': 7, 'Night': 8}}}
                        defaultdict approach.
def rec_dd():
    return defaultdict(rec_dd)
dd = rec_dd()  # defaultdict for arbitrary depth
tuple_d = df.set_index(['Subject_id', 'Subject', 'Time']).to_dict()["Score"]
for k, v in tuple_d.items():
    dd[k[0]][k[1]][k[2]] = v
defaultdict(<function __main__.rec_dd>,
        {'Subject_1': defaultdict(<function __main__.rec_dd>,
                     {'Math': defaultdict(<function __main__.rec_dd>,
                                  {'Day': 1, 'Night': 2}),
                      'Music': defaultdict(<function __main__.rec_dd>,
                                  {'Day': 3, 'Night': 4})}),
         'Subject_2': defaultdict(<function __main__.rec_dd>,
                     {'Math': defaultdict(<function __main__.rec_dd>,
                                  {'Day': 5, 'Night': 6}),
                      'Music': defaultdict(<function __main__.rec_dd>,
                                  {'Day': 7, 'Night': 8})})})
The method rec_dd is taken from @AndrewClark's answer in defaultdict of defaultdict, nested
If you don't want a defaultdict, you can try the following
import json
d = json.loads(json.dumps(dd))
{'Subject_1': {'Math': {'Day': 1, 'Night': 2},
  'Music': {'Day': 3, 'Night': 4}},
 'Subject_2': {'Math': {'Day': 5, 'Night': 6},
  'Music': {'Day': 7, 'Night': 8}}}
The method to turn a defaultdict into dict is taken from @Meow's answer in Python: convert defaultdict to dict 
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With