The dataset contains 4 columns, where name is the name of the child, yearofbirth denotes the year in which the child was born, number represents the number of babies who were named with that particular name.
For example, entry 1 reads, in the year 1880, 7065 girl children were named Mary.
Through pandas, I`m trying to find out every year which name was the most used one. My code
df.groupby(['yearofbirth']).agg({'number':'max'}).reset_index()
The above code partially answers the question in hand.
I want to want the name along with the maximum number.
Based on answers from this question I came up with this solution:
idx = df.groupby(['yearofbirth'])['number'].transform(max) == df['number']
df = df[idx]
print(df)
name number sex yearofbirth
0 Mary 7065 F 1880
I think need if each year have only one maximum value - sort_values
with drop_duplicates
:
df = pd.DataFrame({'name':list('abcaac'),
'yearofbirth':[1800,1800,1801,1801,1802,1802],
'number':[7,8,9,4,2,3],
'sex':['F'] * 6,
})
print (df)
name yearofbirth number sex
0 a 1800 7 F
1 b 1800 8 F
2 c 1801 9 F
3 a 1801 4 F
4 a 1802 2 F
5 c 1802 3 F
df1 = (df.sort_values(['yearofbirth', 'number'], ascending=[True, False])
.drop_duplicates('yearofbirth'))
print (df1)
name yearofbirth number sex
1 b 1800 8 F
2 c 1801 9 F
5 c 1802 3 F
If posssible multiple max values per year use @Teoretic solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With