I have a dataframe that looks like this
Australia Austria United Kingdom Vietnam
date
2020-01-30 9 0 1 2
2020-01-31 9 9 4 2
I would like to crate a new dataframe that inclues countries that have sum of their column > 4 and I do it
df1 = df[[i for i in df.columns if int(df[i].sum()) > 4]]
this gives me
Australia Austria United Kingdom
date
2020-01-30 9 0 1
2020-01-31 9 9 4
I now would like to sort the countries based on the sum of their column and than take the first 2
Australia Austria
date
2020-01-30 9 0
2020-01-31 9 9
I know I have to use sort_values and tail. I just can't workout how
IIUC, you can do:
s = df.sum()
df[s.sort_values(ascending=False).index[:2]]
Output:
Australia Austria
date
2020-01-30 9 0
2020-01-31 9 9
First filter for sum greater like 4
and then add Series.nlargest
for top2 sum and filter by index values:
s = df.sum()
df = df[s[s > 4].nlargest(2).index]
print (df)
Australia Austria
date
2020-01-30 9 0
2020-01-31 9 9
Details:
print (s)
Australia 18.0
Austria 9.0
United 5.0
Kingdom 4.0
Vietnam 0.0
dtype: float64
print (s[s > 4])
Australia 18.0
Austria 9.0
United 5.0
dtype: float64
print (s[s > 4].nlargest(2))
Australia 18.0
Austria 9.0
dtype: float64
print (s[s > 4].nlargest(2).index)
Index(['Australia', 'Austria'], dtype='object')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With