Pandas MultiIndex (more than 2 levels) DataFrame to Nested Dict/JSON

Tags:

This question is similar to this one, but I want to take it a step further. Is it possible to extend the solution to work with more levels? Multilevel dataframes' .to_dict() method has some promising options, but most of them will return entries that are indexed by tuples (i.e. (A, 0, 0): 274.0) rather than nesting them in dictionaries.

For an example of what I'm looking to accomplish, consider this multiindex dataframe:

Click to copy

data = {0: {
        ('A', 0, 0): 274.0, 
        ('A', 0, 1): 19.0, 
        ('A', 1, 0): 67.0, 
        ('A', 1, 1): 12.0, 
        ('B', 0, 0): 83.0, 
        ('B', 0, 1): 45.0
    },
    1: {
        ('A', 0, 0): 254.0, 
        ('A', 0, 1): 11.0, 
        ('A', 1, 0): 58.0, 
        ('A', 1, 1): 11.0, 
        ('B', 0, 0): 76.0, 
        ('B', 0, 1): 56.0
    }   
}
df = pd.DataFrame(data).T
df.index = ['entry1', 'entry2']
df
# output:

         A                              B
         0              1               0
         0      1       0       1       0       1
entry1   274.0  19.0    67.0    12.0    83.0    45.0
entry2   254.0  11.0    58.0    11.0    76.0    56.0

You can imagine that we have many records here, not just two, and that the index names could be longer strings. How could you turn this into nested dictionaries (or directly to JSON) that look like this:

Click to copy

[
 {'entry1': {'A': {0: {0: 274.0, 1: 19.0}, 1: {0: 67.0, 1: 12.0}},
  'B': {0: {0: 83.0, 1: 45.0}}},
 'entry2': {'A': {0: {0: 254.0, 1: 11.0}, 1: {0: 58.0, 1: 11.0}},
  'B': {0: {0: 76.0, 1: 56.0}}}}
]

I'm thinking some amount of recursion could potentially be helpful, maybe something like this, but have so far been unsuccessful.

931

asked Jun 19 '18 13:06

tgordon18

1 Answers

So, you really need to do 2 things here:

df.to_dict()
Convert this to nested dictionary.

df.to_dict(orient='index') gives you a dictionary with the index as keys; it looks like this:

Click to copy

>>> df.to_dict(orient='index')
{'entry1': {('A', 0, 0): 274.0,
  ('A', 0, 1): 19.0,
  ('A', 1, 0): 67.0,
  ('A', 1, 1): 12.0,
  ('B', 0, 0): 83.0,
  ('B', 0, 1): 45.0},
 'entry2': {('A', 0, 0): 254.0,
  ('A', 0, 1): 11.0,
  ('A', 1, 0): 58.0,
  ('A', 1, 1): 11.0,
  ('B', 0, 0): 76.0,
  ('B', 0, 1): 56.0}}

Now you need to nest this. Here's a trick from Martijn Pieters to do that:

Click to copy

def nest(d: dict) -> dict:
    result = {}
    for key, value in d.items():
        target = result
        for k in key[:-1]:  # traverse all keys but the last
            target = target.setdefault(k, {})
        target[key[-1]] = value
    return result

Putting this all together:

Click to copy

def df_to_nested_dict(df: pd.DataFrame) -> dict:
    d = df.to_dict(orient='index')
    return {k: nest(v) for k, v in d.items()}

Output:

Click to copy

>>> df_to_nested_dict(df)
{'entry1': {'A': {0: {0: 274.0, 1: 19.0}, 1: {0: 67.0, 1: 12.0}},
  'B': {0: {0: 83.0, 1: 45.0}}},
 'entry2': {'A': {0: {0: 254.0, 1: 11.0}, 1: {0: 58.0, 1: 11.0}},
  'B': {0: {0: 76.0, 1: 56.0}}}}

answered Oct 21 '22 03:10

Brad Solomon

Related questions
                            
                                Pandas Merge row data with multiple values to Python list for a column
                            
                                Seemingly infinite recursion with generator based coroutines
                            
                                How do I preserve datatype when using apply row-wise in pandas dataframe?
                            
                                Custom connections between layers Keras
                            
                                What does an "Executing <Handle <TaskWakeupMethWrapper..." warning in python asyncio mean
                            
                                TypeError: 'zip' object is not callable in Python 3.x
                            
                                Django OAuth- Separate Resource and Authorization Server
                            
                                Debug where method returns None
                            
                                Import Error: "No module named 'dateutil' "
                            
                                Python Pandas read_sql_query “'NoneType' object is not iterable” error
                            
                                Keras: TypeError: can't pickle _thread.lock objects with KerasClassifier
                            
                                spaCy 2.0: Save and Load a Custom NER model
                            
                                python pandas percent change with columns of dataframe
                            
                                pandas ffill based on condition in another column
                            
                                Python 3 - importing .py file in same directory - ModuleNotFoundError: No module named '__main__.char'; '__main__' is not a package
                            
                                Why when I use GridSearchCV with roc_auc scoring, the score is different for grid_search.score(X,y) and roc_auc_score(y, y_predict)?
                            
                                Convert pandas.core.groupby.SeriesGroupBy to a DataFrame
                            
                                How to convert PDF to CSV with tabula-py?
                            
                                Get hourly average for each month from a netcdf file
                            
                                Set data type for specific column when using read_csv from pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas MultiIndex (more than 2 levels) DataFrame to Nested Dict/JSON

Tags:

python

dictionary

pandas

multi-index

tgordon18

People also ask

1 Answers

Brad Solomon

Recent Activity

Donate For Us