Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Explode index level of DataFrame

Tags:

python

pandas

I have a data frame with a multi-index where one level has a value that is representative for all other values of that level. So for example (code sample below):

         D
A B   C   
x a   y  0
  b   y  1
  all z  2

Here all is a shorthand for representing all other values of that level so that data frame actually represents:

       D
A B C   
x a y  0
  b y  1
  a z  2
  b z  2

This is also the form which I would like to obtain. For each row containing all in that index level, that row is duplicated for each other value in the index level. If it was a column I could replace each occurrence of all with a list of other values and then use DataFrame.explode.

So I thought about resetting that index level, replacing all occurrences of all with a list of other values, then explode that column and finally set it back as an index:

level_values = sorted(set(df.index.unique('B')) - {'all'})
tmp = df.reset_index('B')
mask = df.index.get_level_values('B') == 'all'
col_index = list(tmp.columns).index('B')
for i in np.argwhere(mask).ravel():
    tmp.iat[i, col_index] = level_values
result = tmp.explode('B').set_index('B', append=True)

That however seems pretty inefficient and the code is not very clear. Also the index levels are in the wrong order now (my actual data frame has more than three index levels, so I can't use swaplevel to reorder it).

So I'm wondering if there's a more concise way of exploding these all values?


Code for generating the sample data frames:

import numpy as np
import pandas as pd

df = pd.DataFrame(
    data=[[0], [1], [2]],
    index=pd.MultiIndex.from_arrays(
        [['x', 'x', 'x'], ['a', 'b', 'all'], ['y', 'y', 'z']],
        names=['A', 'B', 'C']
    ),
    columns=['D']
)

expected = pd.DataFrame(
    data=[[0], [1], [2], [2]],
    index=pd.MultiIndex.from_arrays(
        [['x', 'x', 'x', 'x'], ['a', 'b', 'a', 'b'], ['y', 'y', 'z', 'z']],
        names=['A', 'B', 'C']
    ),
    columns=['D']
)
like image 429
a_guest Avatar asked May 20 '20 18:05

a_guest


1 Answers

def fn(x):
    l, rv = [], []
    for v in x:
        if v == 'all':
            rv.append(l[:])
            l = []
        else:
            l.append(v)
            rv.append(v)
    return rv


df2 = pd.DataFrame(zip(*df.index)).T.assign(D=df['D'].values)
df2 = df2.apply(fn).explode(1).rename(columns={0:'A', 1:'B', 2:'C'}).set_index(keys=['A', 'B', 'C'])

print(df2)

Prints:

       D
A B C   
x a y  0
  b y  1
  a z  2
  b z  2
like image 80
Andrej Kesely Avatar answered Oct 16 '22 17:10

Andrej Kesely