I'm Looking for a generic way of turning a DataFrame to a nested dictionary
This is a sample data frame
name v1 v2 v3 0 A A1 A11 1 1 A A2 A12 2 2 B B1 B12 3 3 C C1 C11 4 4 B B2 B21 5 5 A A2 A21 6
The number of columns may differ and so does the column names.
like this :
{ 'A' : { 'A1' : { 'A11' : 1 } 'A2' : { 'A12' : 2 , 'A21' : 6 }} , 'B' : { 'B1' : { 'B12' : 3 } } , 'C' : { 'C1' : { 'C11' : 4}} }
What is best way to achieve this ?
closest I got was with the zip
function but haven't managed to make it work for more then one level (two columns).
We first take the list of nested dictionary and extract the rows of data from it. Then we create another for loop to append the rows into the new list which was originally created empty. Finally we apply the DataFrames function in the pandas library to create the Data Frame.
You can convert a dictionary to Pandas Dataframe using df = pd. DataFrame. from_dict(my_dict) statement.
Adding elements to a Nested Dictionary One way to add a dictionary in the Nested dictionary is to add values one be one, Nested_dict[dict][key] = 'value'. Another way is to add the whole dictionary in one go, Nested_dict[dict] = { 'key': 'value'}.
To create a nested dictionary, simply pass dictionary key:value pair as keyword arguments to dict() Constructor. You can use dict() function along with the zip() function, to combine separate lists of keys and values obtained dynamically at runtime.
I don't understand why there isn't a B2
in your dict. I'm also not sure what you want to happen in the case of repeated column values (every one except the last, I mean.) Assuming the first is an oversight, we could use recursion:
def recur_dictify(frame): if len(frame.columns) == 1: if frame.values.size == 1: return frame.values[0][0] return frame.values.squeeze() grouped = frame.groupby(frame.columns[0]) d = {k: recur_dictify(g.ix[:,1:]) for k,g in grouped} return d
which produces
>>> df name v1 v2 v3 0 A A1 A11 1 1 A A2 A12 2 2 B B1 B12 3 3 C C1 C11 4 4 B B2 B21 5 5 A A2 A21 6 >>> pprint.pprint(recur_dictify(df)) {'A': {'A1': {'A11': 1}, 'A2': {'A12': 2, 'A21': 6}}, 'B': {'B1': {'B12': 3}, 'B2': {'B21': 5}}, 'C': {'C1': {'C11': 4}}}
It might be simpler to use a non-pandas approach, though:
def retro_dictify(frame): d = {} for row in frame.values: here = d for elem in row[:-2]: if elem not in here: here[elem] = {} here = here[elem] here[row[-2]] = row[-1] return d
You can reconstruct your dictionary as easy as follows
>>> result = {} >>> for lst in df.values: ... leaf = result ... for path in lst[:-2]: ... leaf = leaf.setdefault(path, {}) ... leaf.setdefault(lst[-2], list()).append(lst[-1]) ... >>> result {'A': {'A1': {'A11': [1]}, 'A2': {'A21': [6], 'A12': [2]}}, 'C': {'C1': {'C11': [4]}}, 'B': {'B1': {'B12': [3]}, 'B2': {'B21': [5]}}}
If you're sure your leafs won't overlap, replace last line
... leaf.setdefault(lst[-2], list()).append(lst[-1])
with
... leaf[lst[-2]] = lst[-1]
to get output you desired:
>>> result {'A': {'A1': {'A11': 1}, 'A2': {'A21': 6, 'A12': 2}}, 'C': {'C1': {'C11': 4}}, 'B': {'B1': {'B12': 3}, 'B2': {'B21': 5}}}
Sample data used for tests:
import pandas as pd data = {'name': ['A','A','B','C','B','A'], 'v1': ['A1','A2','B1','C1','B2','A2'], 'v2': ['A11','A12','B12','C11','B21','A21'], 'v3': [1,2,3,4,5,6]} df = pd.DataFrame.from_dict(data)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With