Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas pivot table: columns order and subtotals

I'm using Pandas 0.19.

Considering the following data frame:

FID  admin0  admin1  admin2  windspeed  population
0    cntry1  state1  city1   60km/h     700
1    cntry1  state1  city1   90km/h     210
2    cntry1  state1  city2   60km/h     100
3    cntry1  state2  city3   60km/h     70
4    cntry1  state2  city4   60km/h     180
5    cntry1  state2  city4   90km/h     370
6    cntry2  state3  city5   60km/h     890
7    cntry2  state3  city6   60km/h     120
8    cntry2  state3  city6   90km/h     420
9    cntry2  state3  city6   120km/h    360
10   cntry2  state4  city7   60km/h     740

How can I create a table like this one?

                                population
                         60km/h  90km/h  120km/h
admin0  admin1  admin2  
cntry1  state1  city1    700     210      0
cntry1  state1  city2    100     0        0
cntry1  state2  city3    70      0        0
cntry1  state2  city4    180     370      0
cntry2  state3  city5    890     0        0
cntry2  state3  city6    120     420      360
cntry2  state4  city7    740     0        0

I have tried with the following pivot table:

table = pd.pivot_table(df,index=["admin0","admin1","admin2"], columns=["windspeed"], values=["population"],fill_value=0)

In general it works great, but unfortunately I am not able to sort the new columns in the right order: the 120km/h column appears before the ones for 60km/h and 90km/h. How can I specify the order of the new columns?

Moreover, as a second step I need to add subtotals both for admin0 and admin1. Ideally, the table I need should be like this:

                                population
                         60km/h  90km/h  120km/h
admin0  admin1  admin2  
cntry1  state1  city1    700     210      0
cntry1  state1  city2    100     0        0
        SUM state1       800     210      0
cntry1  state2  city3    70      0        0
cntry1  state2  city4    180     370      0
        SUM state2       250     370      0
SUM cntry1               1050    580      0
cntry2  state3  city5    890     0        0
cntry2  state3  city6    120     420      360
        SUM state3       1010    420      360
cntry2  state4  city7    740     0        0
        SUM state4       740     0        0
SUM cntry2               1750    420      360
SUM ALL                  2800    1000    360
like image 595
Andreampa Avatar asked Oct 10 '16 09:10

Andreampa


People also ask

How do I rearrange the order of columns in a pivot table?

Change the order of row or column items In the PivotTable, right-click the row or column label or the item in a label, point to Move, and then use one of the commands on the Move menu to move the item to another location.

How do I add totals to a pivot table in pandas?

We can easily insert a total / sum row to our Python pivot table by using the margins and margin_names parameters. The margins parameters insets the summary row and columns. THe margin_names parameters, allows us to rename the pivot table summary columns.

How do you group and subtotal in a pivot table?

On the PivotTable Analyze tab, in the Active Field group, click Field Settings. This displays the Field Settings dialog box. In the Field Settings dialog box, under Subtotals, do one of the following: To subtotal an outer row or column label using the default summary function, click Automatic.


2 Answers

you can do it using reindex() method and custom sorting:

In [26]: table
Out[26]:
                     population
windspeed               120km/h 60km/h 90km/h
admin0 admin1 admin2
cntry1 state1 city1           0    700    210
              city2           0    100      0
       state2 city3           0     70      0
              city4           0    180    370
cntry2 state3 city5           0    890      0
              city6         360    120    420
       state4 city7           0    740      0

In [27]: cols = sorted(table.columns.tolist(), key=lambda x: int(x[1].replace('km/h','')))

In [28]: cols
Out[28]: [('population', '60km/h'), ('population', '90km/h'), ('population', '120km/h')]

In [29]: table = table.reindex(columns=cols)

In [30]: table
Out[30]:
                     population
windspeed                60km/h 90km/h 120km/h
admin0 admin1 admin2
cntry1 state1 city1         700    210       0
              city2         100      0       0
       state2 city3          70      0       0
              city4         180    370       0
cntry2 state3 city5         890      0       0
              city6         120    420     360
       state4 city7         740      0       0
like image 86
MaxU - stop WAR against UA Avatar answered Sep 20 '22 18:09

MaxU - stop WAR against UA


Solution with subtotals and MultiIndex.from_arrays. Last concat and all Dataframes, sort_index and add all sum:

#replace km/h and convert to int
df.windspeed = df.windspeed.str.replace('km/h','').astype(int)
print (df)
    FID  admin0  admin1 admin2  windspeed  population
0     0  cntry1  state1  city1         60         700
1     1  cntry1  state1  city1         90         210
2     2  cntry1  state1  city2         60         100
3     3  cntry1  state2  city3         60          70
4     4  cntry1  state2  city4         60         180
5     5  cntry1  state2  city4         90         370
6     6  cntry2  state3  city5         60         890
7     7  cntry2  state3  city6         60         120
8     8  cntry2  state3  city6         90         420
9     9  cntry2  state3  city6        120         360
10   10  cntry2  state4  city7         60         740

#pivoting
table = pd.pivot_table(df,
                       index=["admin0","admin1","admin2"], 
                       columns=["windspeed"], 
                       values=["population"],
                       fill_value=0)
print (table)
                    population          
windspeed                   60   90   120
admin0 admin1 admin2                     
cntry1 state1 city1         700  210    0
              city2         100    0    0
       state2 city3          70    0    0
              city4         180  370    0
cntry2 state3 city5         890    0    0
              city6         120  420  360
       state4 city7         740    0    0
#groupby and create sum dataframe by levels 0,1
df1 = table.groupby(level=[0,1]).sum()
df1.index = pd.MultiIndex.from_arrays([df1.index.get_level_values(0), 
                                       df1.index.get_level_values(1)+ '_sum', 
                                       len(df1.index) * ['']])
print (df1)
                   population          
windspeed                 60   90   120
admin0                                 
cntry1 state1_sum         800  210    0
       state2_sum         250  370    0
cntry2 state3_sum        1010  420  360
       state4_sum         740    0    0

df2 = table.groupby(level=0).sum()
df2.index = pd.MultiIndex.from_arrays([df2.index.values + '_sum',
                                       len(df2.index) * [''], 
                                       len(df2.index) * ['']])
print (df2)
             population          
windspeed           60   90   120
cntry1_sum         1050  580    0
cntry2_sum         1750  420  360

#concat all dataframes together, sort index
df = pd.concat([table, df1, df2]).sort_index(level=[0])
#add km/h to second level in columns
df.columns = pd.MultiIndex.from_arrays([df.columns.get_level_values(0),
                                       df.columns.get_level_values(1).astype(str) + 'km/h'])

#add all sum
df.loc[('All_sum','','')] = table.sum().values
print (df)
                             population               
                                 60km/h 90km/h 120km/h
admin0     admin1     admin2                          
cntry1     state1     city1         700    210       0
                      city2         100      0       0
           state1_sum               800    210       0
           state2     city3          70      0       0
                      city4         180    370       0
           state2_sum               250    370       0
cntry1_sum                         1050    580       0
cntry2     state3     city5         890      0       0
                      city6         120    420     360
           state3_sum              1010    420     360
           state4     city7         740      0       0
           state4_sum               740      0       0
cntry2_sum                         1750    420     360
All_sum                            2800   1000     360

EDIT by comment:

def f(x):
    print (x)
    if (len(x) > 1):
        return x.sum()

df1 = table.groupby(level=[0,1]).apply(f).dropna(how='all')
df1.index = pd.MultiIndex.from_arrays([df1.index.get_level_values(0), 
                                       df1.index.get_level_values(1)+ '_sum', 
                                       len(df1.index) * ['']])
print (df1)
                   population              
windspeed                 60     90     120
admin0                                     
cntry1 state1_sum       800.0  210.0    0.0
       state2_sum       250.0  370.0    0.0
cntry2 state3_sum      1010.0  420.0  360.0
like image 26
jezrael Avatar answered Sep 22 '22 18:09

jezrael