Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - How to flatten a hierarchical index in columns

I have a data frame with a hierarchical index in axis 1 (columns) (from a groupby.agg operation):

     USAF   WBAN  year  month  day  s_PC  s_CL  s_CD  s_CNT  tempf                                             sum   sum   sum    sum   amax   amin 0  702730  26451  1993      1    1     1     0    12     13  30.92  24.98 1  702730  26451  1993      1    2     0     0    13     13  32.00  24.98 2  702730  26451  1993      1    3     1    10     2     13  23.00   6.98 3  702730  26451  1993      1    4     1     0    12     13  10.04   3.92 4  702730  26451  1993      1    5     3     0    10     13  19.94  10.94 

I want to flatten it, so that it looks like this (names aren't critical - I could rename):

     USAF   WBAN  year  month  day  s_PC  s_CL  s_CD  s_CNT  tempf_amax  tmpf_amin    0  702730  26451  1993      1    1     1     0    12     13  30.92          24.98 1  702730  26451  1993      1    2     0     0    13     13  32.00          24.98 2  702730  26451  1993      1    3     1    10     2     13  23.00          6.98 3  702730  26451  1993      1    4     1     0    12     13  10.04          3.92 4  702730  26451  1993      1    5     3     0    10     13  19.94          10.94 

How do I do this? (I've tried a lot, to no avail.)

Per a suggestion, here is the head in dict form

{('USAF', ''): {0: '702730',   1: '702730',   2: '702730',   3: '702730',   4: '702730'},  ('WBAN', ''): {0: '26451', 1: '26451', 2: '26451', 3: '26451', 4: '26451'},  ('day', ''): {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},  ('month', ''): {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},  ('s_CD', 'sum'): {0: 12.0, 1: 13.0, 2: 2.0, 3: 12.0, 4: 10.0},  ('s_CL', 'sum'): {0: 0.0, 1: 0.0, 2: 10.0, 3: 0.0, 4: 0.0},  ('s_CNT', 'sum'): {0: 13.0, 1: 13.0, 2: 13.0, 3: 13.0, 4: 13.0},  ('s_PC', 'sum'): {0: 1.0, 1: 0.0, 2: 1.0, 3: 1.0, 4: 3.0},  ('tempf', 'amax'): {0: 30.920000000000002,   1: 32.0,   2: 23.0,   3: 10.039999999999999,   4: 19.939999999999998},  ('tempf', 'amin'): {0: 24.98,   1: 24.98,   2: 6.9799999999999969,   3: 3.9199999999999982,   4: 10.940000000000001},  ('year', ''): {0: 1993, 1: 1993, 2: 1993, 3: 1993, 4: 1993}} 
like image 341
Ross R Avatar asked Jan 24 '13 18:01

Ross R


People also ask

How do you flatten a hierarchy in python?

Method 1: Using reset_index() function Pandas provide a function called reset_index() to flatten the hierarchical index created due to the groupby aggregation function. Parameters: level – removes only the specified levels from the index.

How do you flatten a column in Python?

Flatten columns: use get_level_values() Flatten columns: use to_flat_index()

How do you flatten in pandas?

The first method to flatten the pandas dataframe is through NumPy python package. There is a function in NumPy that is numpy. flatten() that perform this task. First, you have to convert the dataframe to numpy using the to_numpy() method and then apply the flatten() method.

How do you flatten a panda in Groupby?

Use as_index to Flatten a Hierarchical Index in Columns in Pandas. The pandas groupby() function will be used to group bus sales data by quarters, and as_index will flatten the hierarchical indexed columns of the grouped dataframe.


1 Answers

I think the easiest way to do this would be to set the columns to the top level:

df.columns = df.columns.get_level_values(0) 

Note: if the to level has a name you can also access it by this, rather than 0.

.

If you want to combine/join your MultiIndex into one Index (assuming you have just string entries in your columns) you could:

df.columns = [' '.join(col).strip() for col in df.columns.values] 

Note: we must strip the whitespace for when there is no second index.

In [11]: [' '.join(col).strip() for col in df.columns.values] Out[11]:  ['USAF',  'WBAN',  'day',  'month',  's_CD sum',  's_CL sum',  's_CNT sum',  's_PC sum',  'tempf amax',  'tempf amin',  'year'] 
like image 59
Andy Hayden Avatar answered Sep 28 '22 19:09

Andy Hayden