Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe with multiindex column - merge levels

I have a dataframe, grouped, with multiindex columns as below:

import pandas as pd codes = ["one","two","three"]; colours = ["black", "white"]; textures = ["soft", "hard"]; N= 100 # length of the dataframe df = pd.DataFrame({ 'id' : range(1,N+1),                     'weeks_elapsed' : [random.choice(range(1,25)) for i in range(1,N+1)],                     'code' : [random.choice(codes) for i in range(1,N+1)],                     'colour': [random.choice(colours) for i in range(1,N+1)],                     'texture': [random.choice(textures) for i in range(1,N+1)],                     'size': [random.randint(1,100) for i in range(1,N+1)],                     'scaled_size': [random.randint(100,1000) for i in range(1,N+1)]                    },  columns= ['id', 'weeks_elapsed', 'code','colour', 'texture', 'size', 'scaled_size']) grouped = df.groupby(['code', 'colour']).agg( {'size': [np.sum, np.average, np.size, pd.Series.idxmax],'scaled_size': [np.sum, np.average, np.size, pd.Series.idxmax]}).reset_index()  >> grouped     code colour     size                           scaled_size                                              sum    average  size  idxmax            sum    average  size  idxmax 0    one  black    1031  60.647059    17      81     185.153944  10.891408    17      47 1    one  white     481  37.000000    13      53     204.139249  15.703019    13      53 2  three  black     822  48.352941    17       6     123.269405   7.251141    17      31 3  three  white    1614  57.642857    28      50     285.638337  10.201369    28      37 4    two  black     523  58.111111     9      85      80.908912   8.989879     9      88 5    two  white     669  41.812500    16      78      82.098870   5.131179    16      78 [6 rows x 10 columns] 

How can I flatten/merge the column index levels as: "Level1|Level2", e.g. size|sum, scaled_size|sum. etc? If this is not possible, is there a way to groupby() as I did above without creating multi-index columns?

like image 935
Zhubarb Avatar asked Jun 18 '14 16:06

Zhubarb


1 Answers

There is potentially a better way, more pythonic way to flatten MultiIndex columns.

1. Use map and join with string column headers:

grouped.columns = grouped.columns.map('|'.join).str.strip('|')  print(grouped) 

Output:

   code  colour   size|sum  size|average  size|size  size|idxmax  \ 0    one   black       862     53.875000         16           14    1    one   white       554     46.166667         12           18    2  three   black       842     49.529412         17           90    3  three   white       740     56.923077         13           97    4    two   black      1541     61.640000         25           50        scaled_size|sum  scaled_size|average  scaled_size|size  scaled_size|idxmax   0             6980           436.250000                16                  77   1             6101           508.416667                12                  13   2             7889           464.058824                17                  64   3             6329           486.846154                13                  73   4            12809           512.360000                25                  23   

2. Use map with format for column headers that have numeric data types.

grouped.columns = grouped.columns.map('{0[0]}|{0[1]}'.format)  

Output:

   code| colour|  size|sum  size|average  size|size  size|idxmax  \ 0    one   black       734     52.428571         14           30    1    one   white      1110     65.294118         17           88    2  three   black       930     51.666667         18            3    3  three   white      1140     51.818182         22           20    4    two   black       656     38.588235         17           77    5    two   white       704     58.666667         12           17        scaled_size|sum  scaled_size|average  scaled_size|size  scaled_size|idxmax   0             8229           587.785714                14                  57   1             8781           516.529412                17                  73   2            10743           596.833333                18                  21   3            10240           465.454545                22                  26   4             9982           587.176471                17                  16   5             6537           544.750000                12                  49  

3. Use list comprehension with f-string for Python 3.6+:

grouped.columns = [f'{i}|{j}' if j != '' else f'{i}' for i,j in grouped.columns] 

Output:

    code colour  size|sum  size|average  size|size  size|idxmax  \ 0    one  black      1003     43.608696         23           76    1    one  white      1255     59.761905         21           66    2  three  black       777     45.705882         17           39    3  three  white       630     52.500000         12           23    4    two  black       823     54.866667         15           33    5    two  white       491     40.916667         12           64        scaled_size|sum  scaled_size|average  scaled_size|size  scaled_size|idxmax   0            12532           544.869565                23                  27   1            13223           629.666667                21                  13   2             8615           506.764706                17                  92   3             6101           508.416667                12                  43   4             7661           510.733333                15                  42   5             6143           511.916667                12                  49   
like image 148
Scott Boston Avatar answered Sep 29 '22 13:09

Scott Boston