Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert pandas DataFrame to a nested dict

Tags:

python

pandas

I'm Looking for a generic way of turning a DataFrame to a nested dictionary

This is a sample data frame

    name    v1  v2  v3 0   A       A1  A11 1 1   A       A2  A12 2 2   B       B1  B12 3 3   C       C1  C11 4 4   B       B2  B21 5 5   A       A2  A21 6 

The number of columns may differ and so does the column names.

like this :

{ 'A' : {      'A1' : { 'A11' : 1 }     'A2' : { 'A12' : 2 , 'A21' : 6 }} ,  'B' : {      'B1' : { 'B12' : 3 } } ,  'C' : {      'C1' : { 'C11' : 4}} } 

What is best way to achieve this ?

closest I got was with the zip function but haven't managed to make it work for more then one level (two columns).

like image 387
haki Avatar asked Nov 05 '13 20:11

haki


People also ask

How do I convert nested dictionary to pandas DataFrame?

We first take the list of nested dictionary and extract the rows of data from it. Then we create another for loop to append the rows into the new list which was originally created empty. Finally we apply the DataFrames function in the pandas library to create the Data Frame.

How do you convert a DataFrame to a dictionary?

You can convert a dictionary to Pandas Dataframe using df = pd. DataFrame. from_dict(my_dict) statement.

How do I create a nested dictionary?

Adding elements to a Nested Dictionary One way to add a dictionary in the Nested dictionary is to add values one be one, Nested_dict[dict][key] = 'value'. Another way is to add the whole dictionary in one go, Nested_dict[dict] = { 'key': 'value'}.

How do you create a sub dictionary in Python?

To create a nested dictionary, simply pass dictionary key:value pair as keyword arguments to dict() Constructor. You can use dict() function along with the zip() function, to combine separate lists of keys and values obtained dynamically at runtime.


2 Answers

I don't understand why there isn't a B2 in your dict. I'm also not sure what you want to happen in the case of repeated column values (every one except the last, I mean.) Assuming the first is an oversight, we could use recursion:

def recur_dictify(frame):     if len(frame.columns) == 1:         if frame.values.size == 1: return frame.values[0][0]         return frame.values.squeeze()     grouped = frame.groupby(frame.columns[0])     d = {k: recur_dictify(g.ix[:,1:]) for k,g in grouped}     return d 

which produces

>>> df   name  v1   v2  v3 0    A  A1  A11   1 1    A  A2  A12   2 2    B  B1  B12   3 3    C  C1  C11   4 4    B  B2  B21   5 5    A  A2  A21   6 >>> pprint.pprint(recur_dictify(df)) {'A': {'A1': {'A11': 1}, 'A2': {'A12': 2, 'A21': 6}},  'B': {'B1': {'B12': 3}, 'B2': {'B21': 5}},  'C': {'C1': {'C11': 4}}} 

It might be simpler to use a non-pandas approach, though:

def retro_dictify(frame):     d = {}     for row in frame.values:         here = d         for elem in row[:-2]:             if elem not in here:                 here[elem] = {}             here = here[elem]         here[row[-2]] = row[-1]     return d 
like image 159
DSM Avatar answered Oct 13 '22 23:10

DSM


You can reconstruct your dictionary as easy as follows

>>> result = {} >>> for lst in df.values: ...     leaf = result ...     for path in lst[:-2]: ...        leaf = leaf.setdefault(path, {}) ...     leaf.setdefault(lst[-2], list()).append(lst[-1]) ... >>> result {'A': {'A1': {'A11': [1]}, 'A2': {'A21': [6], 'A12': [2]}}, 'C': {'C1': {'C11': [4]}}, 'B':  {'B1': {'B12': [3]}, 'B2': {'B21': [5]}}} 

If you're sure your leafs won't overlap, replace last line

...     leaf.setdefault(lst[-2], list()).append(lst[-1]) 

with

...     leaf[lst[-2]] = lst[-1] 

to get output you desired:

>>> result {'A': {'A1': {'A11': 1}, 'A2': {'A21': 6, 'A12': 2}}, 'C': {'C1': {'C11': 4}}, 'B': {'B1': {'B12': 3}, 'B2': {'B21': 5}}} 

Sample data used for tests:

import pandas as pd data = {'name': ['A','A','B','C','B','A'],           'v1': ['A1','A2','B1','C1','B2','A2'],           'v2': ['A11','A12','B12','C11','B21','A21'],           'v3': [1,2,3,4,5,6]} df = pd.DataFrame.from_dict(data) 
like image 22
alko Avatar answered Oct 13 '22 23:10

alko