Suppose I have a nested dictionary 'user_dict' with structure: <ul> <li> Level 1: UserId (Long Integer)</li> <li> Level 2: Category (String)</li> <li> Level 3: Assorted Attributes (floats, ints, etc..)</li> </ul> For example, an entry of this dictionary would be: <pre class="prettyprint"><code>user_dict[12] = { "Category 1": {"att_1": 1, "att_2": "whatever"}, "Category 2": {"att_1": 23, "att_2": "another"}} </code></pre> each item in <code>user_dict</code> has the same structure and <code>user_dict</code> contains a large number of items which I want to feed to a pandas DataFrame, constructing the series from the attributes. In this case a hierarchical index would be useful for the purpose. Specifically, my question is whether there exists a way to to help the DataFrame constructor understand that the series should be built from the values of the "level 3" in the dictionary? If I try something like: <pre class="prettyprint"><code>df = pandas.DataFrame(users_summary) </code></pre> The items in "level 1" (the UserId's) are taken as columns, which is the opposite of what I want to achieve (have UserId's as index). I know I could construct the series after iterating over the dictionary entries, but if there is a more direct way this would be very useful. A similar question would be asking whether it is possible to construct a pandas DataFrame from json objects listed in a file.

<code>pd.concat</code> accepts a dictionary. With this in mind, it is possible to improve upon the currently accepted answer in terms of simplicity and performance by use a dictionary comprehension to build a dictionary mapping keys to sub-frames. <pre class="prettyprint"><code>pd.concat({k: pd.DataFrame(v).T for k, v in user_dict.items()}, axis=0) </code></pre> Or, <pre class="prettyprint"><code>pd.concat({ k: pd.DataFrame.from_dict(v, 'index') for k, v in user_dict.items() }, axis=0) </code></pre> <pre class="prettyprint"><code> att_1 att_2 12 Category 1 1 whatever Category 2 23 another 15 Category 1 10 foo Category 2 30 bar </code></pre>

So I used to use a for loop for iterating through the dictionary as well, but one thing I've found that works much faster is to convert to a panel and then to a dataframe. Say you have a dictionary d <pre class="prettyprint"><code>import pandas as pd d {'RAY Index': {datetime.date(2014, 11, 3): {'PX_LAST': 1199.46, 'PX_OPEN': 1200.14}, datetime.date(2014, 11, 4): {'PX_LAST': 1195.323, 'PX_OPEN': 1197.69}, datetime.date(2014, 11, 5): {'PX_LAST': 1200.936, 'PX_OPEN': 1195.32}, datetime.date(2014, 11, 6): {'PX_LAST': 1206.061, 'PX_OPEN': 1200.62}}, 'SPX Index': {datetime.date(2014, 11, 3): {'PX_LAST': 2017.81, 'PX_OPEN': 2018.21}, datetime.date(2014, 11, 4): {'PX_LAST': 2012.1, 'PX_OPEN': 2015.81}, datetime.date(2014, 11, 5): {'PX_LAST': 2023.57, 'PX_OPEN': 2015.29}, datetime.date(2014, 11, 6): {'PX_LAST': 2031.21, 'PX_OPEN': 2023.33}}} </code></pre> The command <pre class="prettyprint"><code>pd.Panel(d) <class 'pandas.core.panel.Panel'> Dimensions: 2 (items) x 2 (major_axis) x 4 (minor_axis) Items axis: RAY Index to SPX Index Major_axis axis: PX_LAST to PX_OPEN Minor_axis axis: 2014-11-03 to 2014-11-06 </code></pre> where pd.Panel(d)[item] yields a dataframe <pre class="prettyprint"><code>pd.Panel(d)['SPX Index'] 2014-11-03 2014-11-04 2014-11-05 2014-11-06 PX_LAST 2017.81 2012.10 2023.57 2031.21 PX_OPEN 2018.21 2015.81 2015.29 2023.33 </code></pre> You can then hit the command to_frame() to turn it into a dataframe. I use reset_index as well to turn the major and minor axis into columns rather than have them as indices. <pre class="prettyprint"><code>pd.Panel(d).to_frame().reset_index() major minor RAY Index SPX Index PX_LAST 2014-11-03 1199.460 2017.81 PX_LAST 2014-11-04 1195.323 2012.10 PX_LAST 2014-11-05 1200.936 2023.57 PX_LAST 2014-11-06 1206.061 2031.21 PX_OPEN 2014-11-03 1200.140 2018.21 PX_OPEN 2014-11-04 1197.690 2015.81 PX_OPEN 2014-11-05 1195.320 2015.29 PX_OPEN 2014-11-06 1200.620 2023.33 </code></pre> Finally, if you don't like the way the frame looks you can use the transpose function of panel to change the appearance before calling to_frame() see documentation here http://pandas.pydata.org/pandas-docs/dev/generated/pandas.Panel.transpose.html Just as an example <pre class="prettyprint"><code>pd.Panel(d).transpose(2,0,1).to_frame().reset_index() major minor 2014-11-03 2014-11-04 2014-11-05 2014-11-06 RAY Index PX_LAST 1199.46 1195.323 1200.936 1206.061 RAY Index PX_OPEN 1200.14 1197.690 1195.320 1200.620 SPX Index PX_LAST 2017.81 2012.100 2023.570 2031.210 SPX Index PX_OPEN 2018.21 2015.810 2015.290 2023.330 </code></pre> Hope this helps.

Building on verified answer, for me this worked best: <pre class="prettyprint"><code>ab = pd.concat({k: pd.DataFrame(v).T for k, v in data.items()}, axis=0) ab.T </code></pre>

Construct pandas DataFrame from items in nested dictionary

Tags:

python

pandas

dataframe

multi-index

Suppose I have a nested dictionary 'user_dict' with structure:

Level 1: UserId (Long Integer)
Level 2: Category (String)
Level 3: Assorted Attributes (floats, ints, etc..)

For example, an entry of this dictionary would be:

user_dict[12] = {
    "Category 1": {"att_1": 1, 
                   "att_2": "whatever"},
    "Category 2": {"att_1": 23, 
                   "att_2": "another"}}

each item in user_dict has the same structure and user_dict contains a large number of items which I want to feed to a pandas DataFrame, constructing the series from the attributes. In this case a hierarchical index would be useful for the purpose.

Specifically, my question is whether there exists a way to to help the DataFrame constructor understand that the series should be built from the values of the "level 3" in the dictionary?

If I try something like:

df = pandas.DataFrame(users_summary)

The items in "level 1" (the UserId's) are taken as columns, which is the opposite of what I want to achieve (have UserId's as index).

I know I could construct the series after iterating over the dictionary entries, but if there is a more direct way this would be very useful. A similar question would be asking whether it is possible to construct a pandas DataFrame from json objects listed in a file.

601

asked Nov 26 '12 23:11

vladimir montealegre

7 Answers

A pandas MultiIndex consists of a list of tuples. So the most natural approach would be to reshape your input dict so that its keys are tuples corresponding to the multi-index values you require. Then you can just construct your dataframe using pd.DataFrame.from_dict, using the option orient='index':

user_dict = {12: {'Category 1': {'att_1': 1, 'att_2': 'whatever'},
                  'Category 2': {'att_1': 23, 'att_2': 'another'}},
             15: {'Category 1': {'att_1': 10, 'att_2': 'foo'},
                  'Category 2': {'att_1': 30, 'att_2': 'bar'}}}

pd.DataFrame.from_dict({(i,j): user_dict[i][j] 
                           for i in user_dict.keys() 
                           for j in user_dict[i].keys()},
                       orient='index')


               att_1     att_2
12 Category 1      1  whatever
   Category 2     23   another
15 Category 1     10       foo
   Category 2     30       bar

An alternative approach would be to build your dataframe up by concatenating the component dataframes:

user_ids = []
frames = []

for user_id, d in user_dict.iteritems():
    user_ids.append(user_id)
    frames.append(pd.DataFrame.from_dict(d, orient='index'))

pd.concat(frames, keys=user_ids)

               att_1     att_2
12 Category 1      1  whatever
   Category 2     23   another
15 Category 1     10       foo
   Category 2     30       bar

184

answered Oct 27 '22 11:10

Wouter Overmeire

pd.concat accepts a dictionary. With this in mind, it is possible to improve upon the currently accepted answer in terms of simplicity and performance by use a dictionary comprehension to build a dictionary mapping keys to sub-frames.

pd.concat({k: pd.DataFrame(v).T for k, v in user_dict.items()}, axis=0)

Or,

pd.concat({
        k: pd.DataFrame.from_dict(v, 'index') for k, v in user_dict.items()
    }, 
    axis=0)

              att_1     att_2
12 Category 1     1  whatever
   Category 2    23   another
15 Category 1    10       foo
   Category 2    30       bar

answered Oct 27 '22 11:10

cs95

So I used to use a for loop for iterating through the dictionary as well, but one thing I've found that works much faster is to convert to a panel and then to a dataframe. Say you have a dictionary d

import pandas as pd
d
{'RAY Index': {datetime.date(2014, 11, 3): {'PX_LAST': 1199.46,
'PX_OPEN': 1200.14},
datetime.date(2014, 11, 4): {'PX_LAST': 1195.323, 'PX_OPEN': 1197.69},
datetime.date(2014, 11, 5): {'PX_LAST': 1200.936, 'PX_OPEN': 1195.32},
datetime.date(2014, 11, 6): {'PX_LAST': 1206.061, 'PX_OPEN': 1200.62}},
'SPX Index': {datetime.date(2014, 11, 3): {'PX_LAST': 2017.81,
'PX_OPEN': 2018.21},
datetime.date(2014, 11, 4): {'PX_LAST': 2012.1, 'PX_OPEN': 2015.81},
datetime.date(2014, 11, 5): {'PX_LAST': 2023.57, 'PX_OPEN': 2015.29},
datetime.date(2014, 11, 6): {'PX_LAST': 2031.21, 'PX_OPEN': 2023.33}}}

The command

pd.Panel(d)
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 2 (major_axis) x 4 (minor_axis)
Items axis: RAY Index to SPX Index
Major_axis axis: PX_LAST to PX_OPEN
Minor_axis axis: 2014-11-03 to 2014-11-06

where pd.Panel(d)[item] yields a dataframe

pd.Panel(d)['SPX Index']
2014-11-03  2014-11-04  2014-11-05 2014-11-06
PX_LAST 2017.81 2012.10 2023.57 2031.21
PX_OPEN 2018.21 2015.81 2015.29 2023.33

You can then hit the command to_frame() to turn it into a dataframe. I use reset_index as well to turn the major and minor axis into columns rather than have them as indices.

pd.Panel(d).to_frame().reset_index()
major   minor      RAY Index    SPX Index
PX_LAST 2014-11-03  1199.460    2017.81
PX_LAST 2014-11-04  1195.323    2012.10
PX_LAST 2014-11-05  1200.936    2023.57
PX_LAST 2014-11-06  1206.061    2031.21
PX_OPEN 2014-11-03  1200.140    2018.21
PX_OPEN 2014-11-04  1197.690    2015.81
PX_OPEN 2014-11-05  1195.320    2015.29
PX_OPEN 2014-11-06  1200.620    2023.33

Finally, if you don't like the way the frame looks you can use the transpose function of panel to change the appearance before calling to_frame() see documentation here http://pandas.pydata.org/pandas-docs/dev/generated/pandas.Panel.transpose.html

Just as an example

pd.Panel(d).transpose(2,0,1).to_frame().reset_index()
major        minor  2014-11-03  2014-11-04  2014-11-05  2014-11-06
RAY Index   PX_LAST 1199.46    1195.323     1200.936    1206.061
RAY Index   PX_OPEN 1200.14    1197.690     1195.320    1200.620
SPX Index   PX_LAST 2017.81    2012.100     2023.570    2031.210
SPX Index   PX_OPEN 2018.21    2015.810     2015.290    2023.330

Hope this helps.

answered Oct 27 '22 12:10

Mishiko

This solution should work for arbitrary depth by flattening dictionary keys to a tuple chain

def flatten_dict(nested_dict):
    res = {}
    if isinstance(nested_dict, dict):
        for k in nested_dict:
            flattened_dict = flatten_dict(nested_dict[k])
            for key, val in flattened_dict.items():
                key = list(key)
                key.insert(0, k)
                res[tuple(key)] = val
    else:
        res[()] = nested_dict
    return res


def nested_dict_to_df(values_dict):
    flat_dict = flatten_dict(values_dict)
    df = pd.DataFrame.from_dict(flat_dict, orient="index")
    df.index = pd.MultiIndex.from_tuples(df.index)
    df = df.unstack(level=-1)
    df.columns = df.columns.map("{0[1]}".format)
    return df

answered Oct 27 '22 10:10

tRosenflanz

In case someone wants to get the data frame in a "long format" (leaf values have the same type) without multiindex, you can do this:

pd.DataFrame.from_records(
    [
        (level1, level2, level3, leaf)
        for level1, level2_dict in user_dict.items()
        for level2, level3_dict in level2_dict.items()
        for level3, leaf in level3_dict.items()
    ],
    columns=['UserId', 'Category', 'Attribute', 'value']
)

    UserId    Category Attribute     value
0       12  Category 1     att_1         1
1       12  Category 1     att_2  whatever
2       12  Category 2     att_1        23
3       12  Category 2     att_2   another
4       15  Category 1     att_1        10
5       15  Category 1     att_2       foo
6       15  Category 2     att_1        30
7       15  Category 2     att_2       bar

(I know the original question probably wants (I.) to have Levels 1 and 2 as multiindex and Level 3 as columns and (II.) asks about other ways than iteration over values in the dict. But I hope this answer is still relevant and useful (I.): to people like me who have tried to find a way to get the nested dict into this shape and google only returns this question and (II.): because other answers involve some iteration as well and I find this approach flexible and easy to read; not sure about performance, though.)

answered Oct 27 '22 12:10

Melkor.cz

For other ways to represent the data, you don't need to do much. For example, if you just want the "outer" key to be an index, the "inner" key to be columns and the values to be cell values, this would do the trick:

df = pd.DataFrame.from_dict(user_dict, orient='index')

answered Oct 27 '22 12:10

Pss

Building on verified answer, for me this worked best:

ab = pd.concat({k: pd.DataFrame(v).T for k, v in data.items()}, axis=0)
ab.T

answered Oct 27 '22 10:10

El_1988

Related questions
                            
                                Python split() without removing the delimiter [duplicate]
                            
                                Python's many ways of string formatting — are the older ones (going to be) deprecated?
                            
                                How to use sklearn fit_transform with pandas and return dataframe instead of numpy array?
                            
                                Get __name__ of calling function's module in Python
                            
                                Why do I get TypeError: can't multiply sequence by non-int of type 'float'?
                            
                                Why isn't assigning to an empty list (e.g. [] = "") an error?
                            
                                How to resolve "dyld: Library not loaded: @executable_path.." error
                            
                                Why does csvwriter.writerow() put a comma after each character?
                            
                                Does Python have a toString() equivalent, and can I convert a class to String?
                            
                                Save list of DataFrames to multisheet Excel spreadsheet
                            
                                python selenium click on button
                            
                                Link to Flask static files with url_for
                            
                                Python Unicode Encode Error
                            
                                How do I get rid of the b-prefix in a string in python?
                            
                                Why do tuples take less space in memory than lists?
                            
                                Inheritance and init method in Python
                            
                                Copy multiple files in Python
                            
                                How to determine whether a substring is in a different string [duplicate]
                            
                                Call Python script from bash with argument
                            
                                What does hash do in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Construct pandas DataFrame from items in nested dictionary

Tags:

python

pandas

dataframe

multi-index

vladimir montealegre

People also ask

7 Answers

Wouter Overmeire

cs95

Mishiko

tRosenflanz

Melkor.cz

Pss

El_1988

Recent Activity

Donate For Us