The dataset contains 4 columns, where name is the name of the child, yearofbirth denotes the year in which the child was born, number represents the number of babies who were named with that particular name.
   For example, entry 1 reads, in the year 1880, 7065 girl children were named Mary.

Through pandas, I`m trying to find out every year which name was the most used one. My code
   df.groupby(['yearofbirth']).agg({'number':'max'}).reset_index()
The above code partially answers the question in hand.

I want to want the name along with the maximum number.
Based on answers from this question I came up with this solution:
idx = df.groupby(['yearofbirth'])['number'].transform(max) == df['number']
df = df[idx]
print(df)
    name    number  sex yearofbirth
0   Mary    7065    F   1880
                        I think need if each year have only one maximum value - sort_values with drop_duplicates:
df = pd.DataFrame({'name':list('abcaac'),
                   'yearofbirth':[1800,1800,1801,1801,1802,1802],
                   'number':[7,8,9,4,2,3],
                   'sex':['F'] * 6,
})
print (df)
  name  yearofbirth  number sex
0    a         1800       7   F
1    b         1800       8   F
2    c         1801       9   F
3    a         1801       4   F
4    a         1802       2   F
5    c         1802       3   F
df1 = (df.sort_values(['yearofbirth', 'number'], ascending=[True, False])
         .drop_duplicates('yearofbirth'))
print (df1)
  name  yearofbirth  number sex
1    b         1800       8   F
2    c         1801       9   F
5    c         1802       3   F
If posssible multiple max values per year use @Teoretic solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With