Pandas pivot table: columns order and subtotals

Tags:

I'm using Pandas 0.19.

Considering the following data frame:

FID  admin0  admin1  admin2  windspeed  population
0    cntry1  state1  city1   60km/h     700
1    cntry1  state1  city1   90km/h     210
2    cntry1  state1  city2   60km/h     100
3    cntry1  state2  city3   60km/h     70
4    cntry1  state2  city4   60km/h     180
5    cntry1  state2  city4   90km/h     370
6    cntry2  state3  city5   60km/h     890
7    cntry2  state3  city6   60km/h     120
8    cntry2  state3  city6   90km/h     420
9    cntry2  state3  city6   120km/h    360
10   cntry2  state4  city7   60km/h     740

How can I create a table like this one?

                                population
                         60km/h  90km/h  120km/h
admin0  admin1  admin2  
cntry1  state1  city1    700     210      0
cntry1  state1  city2    100     0        0
cntry1  state2  city3    70      0        0
cntry1  state2  city4    180     370      0
cntry2  state3  city5    890     0        0
cntry2  state3  city6    120     420      360
cntry2  state4  city7    740     0        0

I have tried with the following pivot table:

table = pd.pivot_table(df,index=["admin0","admin1","admin2"], columns=["windspeed"], values=["population"],fill_value=0)

In general it works great, but unfortunately I am not able to sort the new columns in the right order: the 120km/h column appears before the ones for 60km/h and 90km/h. How can I specify the order of the new columns?

Moreover, as a second step I need to add subtotals both for admin0 and admin1. Ideally, the table I need should be like this:

                                population
                         60km/h  90km/h  120km/h
admin0  admin1  admin2  
cntry1  state1  city1    700     210      0
cntry1  state1  city2    100     0        0
        SUM state1       800     210      0
cntry1  state2  city3    70      0        0
cntry1  state2  city4    180     370      0
        SUM state2       250     370      0
SUM cntry1               1050    580      0
cntry2  state3  city5    890     0        0
cntry2  state3  city6    120     420      360
        SUM state3       1010    420      360
cntry2  state4  city7    740     0        0
        SUM state4       740     0        0
SUM cntry2               1750    420      360
SUM ALL                  2800    1000    360

595

asked Oct 10 '16 09:10

Andreampa

2 Answers

you can do it using reindex() method and custom sorting:

In [26]: table
Out[26]:
                     population
windspeed               120km/h 60km/h 90km/h
admin0 admin1 admin2
cntry1 state1 city1           0    700    210
              city2           0    100      0
       state2 city3           0     70      0
              city4           0    180    370
cntry2 state3 city5           0    890      0
              city6         360    120    420
       state4 city7           0    740      0

In [27]: cols = sorted(table.columns.tolist(), key=lambda x: int(x[1].replace('km/h','')))

In [28]: cols
Out[28]: [('population', '60km/h'), ('population', '90km/h'), ('population', '120km/h')]

In [29]: table = table.reindex(columns=cols)

In [30]: table
Out[30]:
                     population
windspeed                60km/h 90km/h 120km/h
admin0 admin1 admin2
cntry1 state1 city1         700    210       0
              city2         100      0       0
       state2 city3          70      0       0
              city4         180    370       0
cntry2 state3 city5         890      0       0
              city6         120    420     360
       state4 city7         740      0       0

answered Sep 20 '22 18:09

MaxU - stop WAR against UA

Solution with subtotals and MultiIndex.from_arrays. Last concat and all Dataframes, sort_index and add all sum:

#replace km/h and convert to int
df.windspeed = df.windspeed.str.replace('km/h','').astype(int)
print (df)
    FID  admin0  admin1 admin2  windspeed  population
0     0  cntry1  state1  city1         60         700
1     1  cntry1  state1  city1         90         210
2     2  cntry1  state1  city2         60         100
3     3  cntry1  state2  city3         60          70
4     4  cntry1  state2  city4         60         180
5     5  cntry1  state2  city4         90         370
6     6  cntry2  state3  city5         60         890
7     7  cntry2  state3  city6         60         120
8     8  cntry2  state3  city6         90         420
9     9  cntry2  state3  city6        120         360
10   10  cntry2  state4  city7         60         740

#pivoting
table = pd.pivot_table(df,
                       index=["admin0","admin1","admin2"], 
                       columns=["windspeed"], 
                       values=["population"],
                       fill_value=0)
print (table)
                    population          
windspeed                   60   90   120
admin0 admin1 admin2                     
cntry1 state1 city1         700  210    0
              city2         100    0    0
       state2 city3          70    0    0
              city4         180  370    0
cntry2 state3 city5         890    0    0
              city6         120  420  360
       state4 city7         740    0    0

#groupby and create sum dataframe by levels 0,1
df1 = table.groupby(level=[0,1]).sum()
df1.index = pd.MultiIndex.from_arrays([df1.index.get_level_values(0), 
                                       df1.index.get_level_values(1)+ '_sum', 
                                       len(df1.index) * ['']])
print (df1)
                   population          
windspeed                 60   90   120
admin0                                 
cntry1 state1_sum         800  210    0
       state2_sum         250  370    0
cntry2 state3_sum        1010  420  360
       state4_sum         740    0    0

df2 = table.groupby(level=0).sum()
df2.index = pd.MultiIndex.from_arrays([df2.index.values + '_sum',
                                       len(df2.index) * [''], 
                                       len(df2.index) * ['']])
print (df2)
             population          
windspeed           60   90   120
cntry1_sum         1050  580    0
cntry2_sum         1750  420  360

#concat all dataframes together, sort index
df = pd.concat([table, df1, df2]).sort_index(level=[0])

#add km/h to second level in columns
df.columns = pd.MultiIndex.from_arrays([df.columns.get_level_values(0),
                                       df.columns.get_level_values(1).astype(str) + 'km/h'])

#add all sum
df.loc[('All_sum','','')] = table.sum().values
print (df)
                             population               
                                 60km/h 90km/h 120km/h
admin0     admin1     admin2                          
cntry1     state1     city1         700    210       0
                      city2         100      0       0
           state1_sum               800    210       0
           state2     city3          70      0       0
                      city4         180    370       0
           state2_sum               250    370       0
cntry1_sum                         1050    580       0
cntry2     state3     city5         890      0       0
                      city6         120    420     360
           state3_sum              1010    420     360
           state4     city7         740      0       0
           state4_sum               740      0       0
cntry2_sum                         1750    420     360
All_sum                            2800   1000     360

EDIT by comment:

def f(x):
    print (x)
    if (len(x) > 1):
        return x.sum()

df1 = table.groupby(level=[0,1]).apply(f).dropna(how='all')
df1.index = pd.MultiIndex.from_arrays([df1.index.get_level_values(0), 
                                       df1.index.get_level_values(1)+ '_sum', 
                                       len(df1.index) * ['']])
print (df1)
                   population              
windspeed                 60     90     120
admin0                                     
cntry1 state1_sum       800.0  210.0    0.0
       state2_sum       250.0  370.0    0.0
cntry2 state3_sum      1010.0  420.0  360.0

answered Sep 22 '22 18:09

jezrael

Related questions
                            
                                List comprehension works but not for loop––why?
                            
                                Finding the area of intersection of multiple overlapping rectangles in Python
                            
                                Opening a gzip file in python Apache Beam
                            
                                Do locally set Cython compiler directives affect one or all functions?
                            
                                additional column when saving pandas data frame to csv file
                            
                                Pandas Dataframe Line Plot: Show Random Markers
                            
                                Python Pandas read_excel doesn't recognize null cell
                            
                                Run multiple servers in python at same time (Threading)
                            
                                How to use yaml.load_all with fileinput.input?
                            
                                Divide two dataframes with python
                            
                                crontab to run python file if not running already
                            
                                How move a multipolygon with geopandas in python2
                            
                                Calculating the sum of a series?
                            
                                Python dictionary lookup performance, get vs in
                            
                                How do I pull a recurring key from a JSON?
                            
                                Using regex, extract quoted strings that may contain nested quotes
                            
                                Override the class patch with method patch (decorator)
                            
                                Using python requests and beautiful soup to pull text
                            
                                Model in Django 1.9. TypeError: __init__() got multiple values for argument 'verbose_name'
                            
                                What is libpython3.so compared with libpython3.5m.so built from python 3.5.2 source?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas pivot table: columns order and subtotals

Tags:

python

pandas

dataframe

pivot

Andreampa

People also ask

2 Answers

MaxU - stop WAR against UA

jezrael

Recent Activity

Donate For Us