I have a question regarding creating pandas dataframe according to the sum of other column.
For example, I have this dataframe
Country | Accident
England Car
England Car
England Car
USA Car
USA Bike
USA Plane
Germany Car
Thailand Plane
I want to make another dataframe based on the sum value of all accident based on the country. We will disregard the type of the accident, while summing them all based on the country.
My desire dataframe would look like this
Country | Sum of Accidents
England 3
USA 3
Germany 1
Thailand 1
Option 1
Use value_counts
df.Country.value_counts().reset_index(name='Sum of Accidents')

Option 2
Use groupby then size
df.groupby('Country').size().sort_values(ascending=False) \
.reset_index(name='Sum of Accidents')

You can use the groupby method.
Example -
In [36]: df.groupby(["country"]).count().sort_values(["accident"], ascending=False).rename(columns={"accident" : "Sum of accidents"}).reset_index()
Out[36]:
country Sum of accidents
0 England 3
1 USA 3
2 Germany 1
3 Thailand 1
Explanation -
df.groupby(["country"]). # Group by country
count(). # Aggregation function which counts the number of occurences of country
sort_values( # Sorting it
["accident"],
ascending=False).
rename(columns={"accident" : "Sum of accidents"}). # Renaming the columns
reset_index() # Resetting the index, it takes the country as the index if you don't do this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With