Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas + groupby

The dataset contains 4 columns, where name is the name of the child, yearofbirth denotes the year in which the child was born, number represents the number of babies who were named with that particular name.

   For example, entry 1 reads, in the year 1880, 7065 girl children were named Mary.

HEAD of dataset

Through pandas, I`m trying to find out every year which name was the most used one. My code

   df.groupby(['yearofbirth']).agg({'number':'max'}).reset_index()

The above code partially answers the question in hand.

Result of query

I want to want the name along with the maximum number.

like image 913
eager_learner Avatar asked Sep 20 '18 09:09

eager_learner


2 Answers

Based on answers from this question I came up with this solution:

idx = df.groupby(['yearofbirth'])['number'].transform(max) == df['number']
df = df[idx]

print(df)

    name    number  sex yearofbirth
0   Mary    7065    F   1880
like image 143
Teoretic Avatar answered Sep 28 '22 21:09

Teoretic


I think need if each year have only one maximum value - sort_values with drop_duplicates:

df = pd.DataFrame({'name':list('abcaac'),
                   'yearofbirth':[1800,1800,1801,1801,1802,1802],
                   'number':[7,8,9,4,2,3],
                   'sex':['F'] * 6,
})

print (df)
  name  yearofbirth  number sex
0    a         1800       7   F
1    b         1800       8   F
2    c         1801       9   F
3    a         1801       4   F
4    a         1802       2   F
5    c         1802       3   F

df1 = (df.sort_values(['yearofbirth', 'number'], ascending=[True, False])
         .drop_duplicates('yearofbirth'))
print (df1)
  name  yearofbirth  number sex
1    b         1800       8   F
2    c         1801       9   F
5    c         1802       3   F

If posssible multiple max values per year use @Teoretic solution.

like image 20
jezrael Avatar answered Sep 28 '22 21:09

jezrael