There is a dataframe, say
df
Country Continent PopulationEst
0 Germany Europe 8.036970e+07
1 Canada North America 35.239865+07
...
I want to create a dateframe that displays the size (the number of countries in each continent), and the sum, mean, and std deviation for the estimated population of each country.
I did the following:
df2 = df.groupby('Continent').agg(['size', 'sum','mean','std'])
But the result df2 has multiple level columns like below:
df2.columns
MultiIndex(levels=[['PopulationEst'], ['size', 'sum', 'mean', 'std']],
labels=[[0, 0, 0, 0], [0, 1, 2, 3]])
How can I remove the PopulationEst from the columns, so just have ['size', 'sum', 'mean', 'std'] columns for the dataframe?
I think you need add ['PopulationEst'] - agg uses this column for aggregation:
df2 = df.groupby('Continent')['PopulationEst'].agg(['size', 'sum','mean','std'])
Sample:
df = pd.DataFrame({
'Country': ['Germany', 'Germany', 'Canada', 'Canada'],
'PopulationEst': [8, 4, 35, 50],
'Continent': ['Europe', 'Europe', 'North America', 'North America']},
columns=['Country','PopulationEst','Continent'])
print (df)
Country PopulationEst Continent
0 Germany 8 Europe
1 Germany 4 Europe
2 Canada 35 North America
3 Canada 50 North America
df2 = df.groupby('Continent')['PopulationEst'].agg(['size', 'sum','mean','std'])
print (df2)
size sum mean std
Continent
Europe 2 12 6.0 2.828427
North America 2 85 42.5 10.606602
df2 = df.groupby('Continent').agg(['size', 'sum','mean','std'])
print (df2)
PopulationEst
size sum mean std
Continent
Europe 2 12 6.0 2.828427
North America 2 85 42.5 10.606602
Another solution is with MultiIndex.droplevel:
df2 = df.groupby('Continent').agg(['size', 'sum','mean','std'])
df2.columns = df2.columns.droplevel(0)
print (df2)
size sum mean std
Continent
Europe 2 12 6.0 2.828427
North America 2 85 42.5 10.606602
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With