I'm Looking for a generic way of turning a DataFrame to a nested dictionary This is a sample data frame <pre class="prettyprint"><code> name v1 v2 v3 0 A A1 A11 1 1 A A2 A12 2 2 B B1 B12 3 3 C C1 C11 4 4 B B2 B21 5 5 A A2 A21 6 </code></pre> The number of columns may differ and so does the column names. like this : <pre class="prettyprint"><code>{ 'A' : { 'A1' : { 'A11' : 1 } 'A2' : { 'A12' : 2 , 'A21' : 6 }} , 'B' : { 'B1' : { 'B12' : 3 } } , 'C' : { 'C1' : { 'C11' : 4}} } </code></pre> What is best way to achieve this ? closest I got was with the <code>zip</code> function but haven't managed to make it work for more then one level (two columns).

I don't understand why there isn't a <code>B2</code> in your dict. I'm also not sure what you want to happen in the case of repeated column values (every one except the last, I mean.) Assuming the first is an oversight, we could use recursion: <pre class="prettyprint"><code>def recur_dictify(frame): if len(frame.columns) == 1: if frame.values.size == 1: return frame.values[0][0] return frame.values.squeeze() grouped = frame.groupby(frame.columns[0]) d = {k: recur_dictify(g.ix[:,1:]) for k,g in grouped} return d </code></pre> which produces <pre class="prettyprint"><code>>>> df name v1 v2 v3 0 A A1 A11 1 1 A A2 A12 2 2 B B1 B12 3 3 C C1 C11 4 4 B B2 B21 5 5 A A2 A21 6 >>> pprint.pprint(recur_dictify(df)) {'A': {'A1': {'A11': 1}, 'A2': {'A12': 2, 'A21': 6}}, 'B': {'B1': {'B12': 3}, 'B2': {'B21': 5}}, 'C': {'C1': {'C11': 4}}} </code></pre> It might be simpler to use a non-pandas approach, though: <pre class="prettyprint"><code>def retro_dictify(frame): d = {} for row in frame.values: here = d for elem in row[:-2]: if elem not in here: here[elem] = {} here = here[elem] here[row[-2]] = row[-1] return d </code></pre>

You can reconstruct your dictionary as easy as follows <pre class="prettyprint"><code>>>> result = {} >>> for lst in df.values: ... leaf = result ... for path in lst[:-2]: ... leaf = leaf.setdefault(path, {}) ... leaf.setdefault(lst[-2], list()).append(lst[-1]) ... >>> result {'A': {'A1': {'A11': [1]}, 'A2': {'A21': [6], 'A12': [2]}}, 'C': {'C1': {'C11': [4]}}, 'B': {'B1': {'B12': [3]}, 'B2': {'B21': [5]}}} </code></pre> If you're sure your leafs won't overlap, replace last line <pre class="prettyprint"><code>... leaf.setdefault(lst[-2], list()).append(lst[-1]) </code></pre> with <pre class="prettyprint"><code>... leaf[lst[-2]] = lst[-1] </code></pre> to get output you desired: <pre class="prettyprint"><code>>>> result {'A': {'A1': {'A11': 1}, 'A2': {'A21': 6, 'A12': 2}}, 'C': {'C1': {'C11': 4}}, 'B': {'B1': {'B12': 3}, 'B2': {'B21': 5}}} </code></pre> Sample data used for tests: <pre class="prettyprint"><code>import pandas as pd data = {'name': ['A','A','B','C','B','A'], 'v1': ['A1','A2','B1','C1','B2','A2'], 'v2': ['A11','A12','B12','C11','B21','A21'], 'v3': [1,2,3,4,5,6]} df = pd.DataFrame.from_dict(data) </code></pre>

Convert pandas DataFrame to a nested dict

Tags:

python

pandas

I'm Looking for a generic way of turning a DataFrame to a nested dictionary

This is a sample data frame

    name    v1  v2  v3 0   A       A1  A11 1 1   A       A2  A12 2 2   B       B1  B12 3 3   C       C1  C11 4 4   B       B2  B21 5 5   A       A2  A21 6

The number of columns may differ and so does the column names.

like this :

{ 'A' : {      'A1' : { 'A11' : 1 }     'A2' : { 'A12' : 2 , 'A21' : 6 }} ,  'B' : {      'B1' : { 'B12' : 3 } } ,  'C' : {      'C1' : { 'C11' : 4}} }

What is best way to achieve this ?

closest I got was with the zip function but haven't managed to make it work for more then one level (two columns).

387

asked Nov 05 '13 20:11

haki

2 Answers

I don't understand why there isn't a B2 in your dict. I'm also not sure what you want to happen in the case of repeated column values (every one except the last, I mean.) Assuming the first is an oversight, we could use recursion:

def recur_dictify(frame):     if len(frame.columns) == 1:         if frame.values.size == 1: return frame.values[0][0]         return frame.values.squeeze()     grouped = frame.groupby(frame.columns[0])     d = {k: recur_dictify(g.ix[:,1:]) for k,g in grouped}     return d

which produces

>>> df   name  v1   v2  v3 0    A  A1  A11   1 1    A  A2  A12   2 2    B  B1  B12   3 3    C  C1  C11   4 4    B  B2  B21   5 5    A  A2  A21   6 >>> pprint.pprint(recur_dictify(df)) {'A': {'A1': {'A11': 1}, 'A2': {'A12': 2, 'A21': 6}},  'B': {'B1': {'B12': 3}, 'B2': {'B21': 5}},  'C': {'C1': {'C11': 4}}}

It might be simpler to use a non-pandas approach, though:

def retro_dictify(frame):     d = {}     for row in frame.values:         here = d         for elem in row[:-2]:             if elem not in here:                 here[elem] = {}             here = here[elem]         here[row[-2]] = row[-1]     return d

159

answered Oct 13 '22 23:10

DSM

You can reconstruct your dictionary as easy as follows

>>> result = {} >>> for lst in df.values: ...     leaf = result ...     for path in lst[:-2]: ...        leaf = leaf.setdefault(path, {}) ...     leaf.setdefault(lst[-2], list()).append(lst[-1]) ... >>> result {'A': {'A1': {'A11': [1]}, 'A2': {'A21': [6], 'A12': [2]}}, 'C': {'C1': {'C11': [4]}}, 'B':  {'B1': {'B12': [3]}, 'B2': {'B21': [5]}}}

If you're sure your leafs won't overlap, replace last line

...     leaf.setdefault(lst[-2], list()).append(lst[-1])

with

...     leaf[lst[-2]] = lst[-1]

to get output you desired:

>>> result {'A': {'A1': {'A11': 1}, 'A2': {'A21': 6, 'A12': 2}}, 'C': {'C1': {'C11': 4}}, 'B': {'B1': {'B12': 3}, 'B2': {'B21': 5}}}

Sample data used for tests:

import pandas as pd data = {'name': ['A','A','B','C','B','A'],           'v1': ['A1','A2','B1','C1','B2','A2'],           'v2': ['A11','A12','B12','C11','B21','A21'],           'v3': [1,2,3,4,5,6]} df = pd.DataFrame.from_dict(data)

answered Oct 13 '22 23:10

alko

Related questions
                            
                                Standalone colorbar (matplotlib)
                            
                                Error Pickling in Python: io.UnsupportedOperation: read
                            
                                XLRD/Python: Reading Excel file into dict with for-loops
                            
                                Python can't find module NLTK
                            
                                Losslessly compressing images on django
                            
                                Strip timezone info in pandas
                            
                                Changing pixel color value in PIL
                            
                                cqlsh connection error: 'ref() does not take keyword arguments'
                            
                                Pandas(Python) : Fill empty cells with with previous row value?
                            
                                select columns based on columns names containing a specific string in pandas
                            
                                Repeat rows in a pandas DataFrame based on column value
                            
                                print() method to print passed expression literally along with computed output for quick debugging
                            
                                Pytorch RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0
                            
                                Django: ModelMultipleChoiceField doesn't select initial choices
                            
                                Spoofing the origination IP address of an HTTP request
                            
                                What's the python __all__ module level variable for? [duplicate]
                            
                                Run Python/Django Management Command from a UnitTest/WebTest
                            
                                Easiest way to combine date and time strings to single datetime object using Python
                            
                                How to set the working directory for a Fabric task?
                            
                                Python - how can I get the class name from within a class method - using @classmethod

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With