Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get rid of multilevel index after using pivot table pandas?

I had following data frame (the real data frame is much more larger than this one ) :

sale_user_id    sale_product_id count 1                 1              1 1                 8              1 1                 52             1 1                 312            5 1                 315            1 

Then reshaped it to move the values in sale_product_id as column headers using the following code:

reshaped_df=id_product_count.pivot(index='sale_user_id',columns='sale_product_id',values='count') 

and the resulting data frame is:

sale_product_id -1057   1   2   3   4   5   6   8   9   10  ... 98  980 981 982 983 984 985 986 987 99 sale_user_id                                                                                     1                NaN    1.0 NaN NaN NaN NaN NaN 1.0 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3                NaN    1.0 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4                NaN    NaN 1.0 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 

as you can see we have a multililevel index , what i need is to have sale_user_is in the first column without multilevel indexing:

i take the following approach :

reshaped_df.reset_index() 

the the result would be like this i still have the sale_product_id column , but i do not need it anymore:

sale_product_id sale_user_id    -1057   1   2   3   4   5   6   8   9   ... 98  980 981 982 983 984 985 986 987 99 0                          1    NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1                          3    NaN 1.0 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2                          4    NaN NaN 1.0 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN  

i can subset this data frame to get rid of sale_product_id but i don't think it would be efficient.I am looking for an efficient way to get rid of multilevel indexing while reshaping the original data frame

like image 262
chessosapiens Avatar asked Aug 15 '16 07:08

chessosapiens


People also ask

How do I drop a multilevel index?

To drop multiple levels from a multi-level column index, use the columns. droplevel() repeatedly.

How do I get rid of indexing in pandas?

The most straightforward way to drop a Pandas dataframe index is to use the Pandas . reset_index() method. By default, the method will only reset the index, forcing values from 0 - len(df)-1 as the index. The method will also simply insert the dataframe index into a column in the dataframe.


1 Answers

You need remove only index name, use rename_axis (new in pandas 0.18.0):

print (reshaped_df) sale_product_id  1    8    52   312  315 sale_user_id                             1                  1    1    1    5    1  print (reshaped_df.index.name) sale_user_id  print (reshaped_df.rename_axis(None)) sale_product_id  1    8    52   312  315 1                  1    1    1    5    1 

Another solution working in pandas below 0.18.0:

reshaped_df.index.name = None print (reshaped_df)  sale_product_id  1    8    52   312  315 1                  1    1    1    5    1 

If need remove columns name also:

print (reshaped_df.columns.name) sale_product_id  print (reshaped_df.rename_axis(None).rename_axis(None, axis=1))    1    8    52   312  315 1    1    1    1    5    1 

Another solution:

reshaped_df.columns.name = None reshaped_df.index.name = None print (reshaped_df)    1    8    52   312  315 1    1    1    1    5    1 

EDIT by comment:

You need reset_index with parameter drop=True:

reshaped_df = reshaped_df.reset_index(drop=True) print (reshaped_df) sale_product_id  1    8    52   312  315 0                  1    1    1    5    1  #if need reset index nad remove column name reshaped_df = reshaped_df.reset_index(drop=True).rename_axis(None, axis=1) print (reshaped_df)    1    8    52   312  315 0    1    1    1    5    1 

Of if need remove only column name:

reshaped_df = reshaped_df.rename_axis(None, axis=1) print (reshaped_df)               1    8    52   312  315 sale_user_id                          1               1    1    1    5    1 

Edit1:

So if need create new column from index and remove columns names:

reshaped_df =  reshaped_df.rename_axis(None, axis=1).reset_index()  print (reshaped_df)    sale_user_id  1  8  52  312  315 0             1  1  1   1    5    1 
like image 83
jezrael Avatar answered Sep 19 '22 06:09

jezrael