Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nested dictionary to multiindex dataframe where dictionary keys are column labels

Say I have a dictionary that looks like this:

dictionary = {'A' : {'a': [1,2,3,4,5],                      'b': [6,7,8,9,1]},                'B' : {'a': [2,3,4,5,6],                      'b': [7,8,9,1,2]}} 

and I want a dataframe that looks something like this:

     A   B      a b a b   0  1 6 2 7   1  2 7 3 8   2  3 8 4 9   3  4 9 5 1   4  5 1 6 2 

Is there a convenient way to do this? If I try:

In [99]:  DataFrame(dictionary)  Out[99]:      A               B a   [1, 2, 3, 4, 5] [2, 3, 4, 5, 6] b   [6, 7, 8, 9, 1] [7, 8, 9, 1, 2] 

I get a dataframe where each element is a list. What I need is a multiindex where each level corresponds to the keys in the nested dict and the rows corresponding to each element in the list as shown above. I think I can work a very crude solution but I'm hoping there might be something a bit simpler.

like image 893
pbreach Avatar asked Jul 28 '14 03:07

pbreach


People also ask

How do you convert a nested dictionary to a DataFrame in Python?

We first take the list of nested dictionary and extract the rows of data from it. Then we create another for loop to append the rows into the new list which was originally created empty. Finally we apply the DataFrames function in the pandas library to create the Data Frame.

When a DataFrame is created using 2D dictionary column labels are formed by?

If a dataframe is created using 2D dictionary then the index/row labels are formed form. *​ See what the community says and unlock a badge.

Can keys of dictionary become index of the series?

All the keys in the dictionary will become the indices of the Series object, whereas all the values from the key-value pairs in the dictionary will become the values (data) of the Series object.


1 Answers

Pandas wants the MultiIndex values as tuples, not nested dicts. The simplest thing is to convert your dictionary to the right format before trying to pass it to DataFrame:

>>> reform = {(outerKey, innerKey): values for outerKey, innerDict in dictionary.iteritems() for innerKey, values in innerDict.iteritems()} >>> reform {('A', 'a'): [1, 2, 3, 4, 5],  ('A', 'b'): [6, 7, 8, 9, 1],  ('B', 'a'): [2, 3, 4, 5, 6],  ('B', 'b'): [7, 8, 9, 1, 2]} >>> pandas.DataFrame(reform)    A     B       a  b  a  b 0  1  6  2  7 1  2  7  3  8 2  3  8  4  9 3  4  9  5  1 4  5  1  6  2  [5 rows x 4 columns] 
like image 96
BrenBarn Avatar answered Sep 28 '22 09:09

BrenBarn