Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe to hierarchical dictionary

I have a 4 column DataFrame

Subject_id    Subject         Time  Score    
Subject_1        Math          Day      1     
Subject_1        Math        Night      2                           
Subject_1       Music          Day      3
Subject_1       Music        Night      4
Subject_2        Math          Day      5       
Subject_2        Math        Night      6                              
Subject_2       Music          Day      7
Subject_2       Music        Night      8

I want to group this columns hierarchically and convert them into a dictionary as follows:

result = {
    'Subject_1': {
        'Math': {
            'Day': 1,
            'Night': 2 
        },
        'Music': {
            'Day': 3,
            'Night': 4
        }
    }
    'Subject_2': {
        'Math': {
            'Day': 5,
            'Night': 6
        },
        'Music': {
            'Day': 7,
            'Night': 8
        }
    }
}

I managed to use pivot with one column less and get the desired result

df.pivot('Subject_id', 'Subject', 'Score').to_dict('index')

But if I try one more column (one level deeper dictionary)

df.pivot('Subject_id', 'Subject', 'Time', 'Score').to_dict('index')

I get the error:

TypeError: pivot() takes at most 4 arguments (5 given)

I have similarly tried using groupby with a lambda function with 3 columns:

df.groupby('Subject_id')
   .apply(lambda x: dict(zip(x['Subject'],x['Score'])))
   .to_dict()

But I cannot get the desired result with 4 columns.

Is there a way I can give an arbitrary number of columns and convert them into a hierarchical dictionary?

Like grouping by several fields in a specific order of hierarchy.

like image 379
Sembei Norimaki Avatar asked Jan 15 '18 16:01

Sembei Norimaki


2 Answers

Here's one way

In [86]: {k: g.pivot('Subject', 'Time', 'Score').to_dict('index') 
          for k, g in df.groupby('Subject_id')}
Out[86]:
{'Subject_1': {'Math': {'Day': 1, 'Night': 2},
  'Music': {'Day': 3, 'Night': 4}},
 'Subject_2': {'Math': {'Day': 5, 'Night': 6},
  'Music': {'Day': 7, 'Night': 8}}}
like image 138
Zero Avatar answered Sep 16 '22 21:09

Zero


defaultdict approach.

def rec_dd():
    return defaultdict(rec_dd)

dd = rec_dd()  # defaultdict for arbitrary depth
tuple_d = df.set_index(['Subject_id', 'Subject', 'Time']).to_dict()["Score"]

for k, v in tuple_d.items():
    dd[k[0]][k[1]][k[2]] = v

defaultdict(<function __main__.rec_dd>,
        {'Subject_1': defaultdict(<function __main__.rec_dd>,
                     {'Math': defaultdict(<function __main__.rec_dd>,
                                  {'Day': 1, 'Night': 2}),
                      'Music': defaultdict(<function __main__.rec_dd>,
                                  {'Day': 3, 'Night': 4})}),
         'Subject_2': defaultdict(<function __main__.rec_dd>,
                     {'Math': defaultdict(<function __main__.rec_dd>,
                                  {'Day': 5, 'Night': 6}),
                      'Music': defaultdict(<function __main__.rec_dd>,
                                  {'Day': 7, 'Night': 8})})})

The method rec_dd is taken from @AndrewClark's answer in defaultdict of defaultdict, nested

If you don't want a defaultdict, you can try the following

import json
d = json.loads(json.dumps(dd))

{'Subject_1': {'Math': {'Day': 1, 'Night': 2},
  'Music': {'Day': 3, 'Night': 4}},
 'Subject_2': {'Math': {'Day': 5, 'Night': 6},
  'Music': {'Day': 7, 'Night': 8}}}

The method to turn a defaultdict into dict is taken from @Meow's answer in Python: convert defaultdict to dict

like image 30
Tai Avatar answered Sep 19 '22 21:09

Tai