Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is going on behind the Pandas scenes that is causing a level in a MultiIndex not to be dropped?

Consider the data frame df
Note that the columns object is a single level MultiIndex.

midx = pd.MultiIndex.from_product([list('AB')])
df = pd.DataFrame(1, range(3), midx)

   A  B
0  1  1
1  1  1
2  1  1

Now when I reference column 'A'

df.A

   A
0  1
1  1
2  1

I get a single column data frame and not the series object I expected. Consequently, I can indefinitely reference this column.

df.A.A.A.A.A

   A
0  1
1  1
2  1

As another check, I used xs

df.xs('A', axis=1)

   A
0  1
1  1
2  1

Same problem.
pd.IndexSlice?

df.loc[:, pd.IndexSlice['A']]

   A
0  1
1  1
2  1

How about squeeze

df.A.squeeze()

0    1
1    1
2    1
Name: (A,), dtype: int64

This isn't at all what I expected.

  1. What is preventing this from turning into a series object with the name of 'A'?
  2. What is the most intuitive way to fix this?
  3. Is there any good reason why we should ever want a single level MultiIndex?
like image 341
piRSquared Avatar asked Nov 08 '22 08:11

piRSquared


1 Answers

I wrote this to fix the problem.

def fix_single_level_multiindex(midx):
    return midx.get_level_values(0) if midx.nlevels == 1 else midx

Or

def fix_single_level_multiindex(midx):
    return midx.levels[0][midx.labels[0]] if midx.nlevels == 1 else midx
like image 110
piRSquared Avatar answered Nov 15 '22 11:11

piRSquared