Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas MultiIndex (more than 2 levels) DataFrame to Nested Dict/JSON

This question is similar to this one, but I want to take it a step further. Is it possible to extend the solution to work with more levels? Multilevel dataframes' .to_dict() method has some promising options, but most of them will return entries that are indexed by tuples (i.e. (A, 0, 0): 274.0) rather than nesting them in dictionaries.

For an example of what I'm looking to accomplish, consider this multiindex dataframe:

data = {0: {
        ('A', 0, 0): 274.0, 
        ('A', 0, 1): 19.0, 
        ('A', 1, 0): 67.0, 
        ('A', 1, 1): 12.0, 
        ('B', 0, 0): 83.0, 
        ('B', 0, 1): 45.0
    },
    1: {
        ('A', 0, 0): 254.0, 
        ('A', 0, 1): 11.0, 
        ('A', 1, 0): 58.0, 
        ('A', 1, 1): 11.0, 
        ('B', 0, 0): 76.0, 
        ('B', 0, 1): 56.0
    }   
}
df = pd.DataFrame(data).T
df.index = ['entry1', 'entry2']
df
# output:

         A                              B
         0              1               0
         0      1       0       1       0       1
entry1   274.0  19.0    67.0    12.0    83.0    45.0
entry2   254.0  11.0    58.0    11.0    76.0    56.0

You can imagine that we have many records here, not just two, and that the index names could be longer strings. How could you turn this into nested dictionaries (or directly to JSON) that look like this:

[
 {'entry1': {'A': {0: {0: 274.0, 1: 19.0}, 1: {0: 67.0, 1: 12.0}},
  'B': {0: {0: 83.0, 1: 45.0}}},
 'entry2': {'A': {0: {0: 254.0, 1: 11.0}, 1: {0: 58.0, 1: 11.0}},
  'B': {0: {0: 76.0, 1: 56.0}}}}
]

I'm thinking some amount of recursion could potentially be helpful, maybe something like this, but have so far been unsuccessful.

like image 931
tgordon18 Avatar asked Jun 19 '18 13:06

tgordon18


People also ask

Can DataFrame have multiple indexes?

A multi-level index DataFrame is a type of DataFrame that contains multiple level or hierarchical indexing. You can create a MultiIndex (multi-level index) in the following ways. The following example demonstrates steps to create MultiIndexes DataFrame for both index and columns using pandas.

What does the pandas function MultiIndex From_tuples do?

from_tuples() function is used to convert list of tuples to MultiIndex. It is one of the several ways in which we construct a MultiIndex.

Is Dict faster than pandas?

For certain small, targeted purposes, a dict may be faster. And if that is all you need, then use a dict, for sure! But if you need/want the power and luxury of a DataFrame, then a dict is no substitute. It is meaningless to compare speed if the data structure does not first satisfy your needs.


1 Answers

So, you really need to do 2 things here:

  • df.to_dict()
  • Convert this to nested dictionary.

df.to_dict(orient='index') gives you a dictionary with the index as keys; it looks like this:

>>> df.to_dict(orient='index')
{'entry1': {('A', 0, 0): 274.0,
  ('A', 0, 1): 19.0,
  ('A', 1, 0): 67.0,
  ('A', 1, 1): 12.0,
  ('B', 0, 0): 83.0,
  ('B', 0, 1): 45.0},
 'entry2': {('A', 0, 0): 254.0,
  ('A', 0, 1): 11.0,
  ('A', 1, 0): 58.0,
  ('A', 1, 1): 11.0,
  ('B', 0, 0): 76.0,
  ('B', 0, 1): 56.0}}

Now you need to nest this. Here's a trick from Martijn Pieters to do that:

def nest(d: dict) -> dict:
    result = {}
    for key, value in d.items():
        target = result
        for k in key[:-1]:  # traverse all keys but the last
            target = target.setdefault(k, {})
        target[key[-1]] = value
    return result

Putting this all together:

def df_to_nested_dict(df: pd.DataFrame) -> dict:
    d = df.to_dict(orient='index')
    return {k: nest(v) for k, v in d.items()}

Output:

>>> df_to_nested_dict(df)
{'entry1': {'A': {0: {0: 274.0, 1: 19.0}, 1: {0: 67.0, 1: 12.0}},
  'B': {0: {0: 83.0, 1: 45.0}}},
 'entry2': {'A': {0: {0: 254.0, 1: 11.0}, 1: {0: 58.0, 1: 11.0}},
  'B': {0: {0: 76.0, 1: 56.0}}}}
like image 50
Brad Solomon Avatar answered Oct 21 '22 03:10

Brad Solomon