Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas - How to aggregate two columns and keeping all other columns

Tags:

python

pandas

I have the below synopsis of a df:

  movie id       movie title release date                                           IMDb URL                      genre  user id  rating  
0         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller        5       3  
1         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller      268       2  
2         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller      276       4  
3         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller      217       3  
4         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller       87       4  

What i'm looking for is count 'user id' and average 'rating' and keep all other columns intact. So the result will be something like this:

  movie id       movie title release date                                           IMDb URL                      genre  user id     rating  
0         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller      50       3.75  
1         3  Four Rooms (1995)    1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller      35       2.34  

any idea how to do that?

Thanks

like image 949
mobaset Avatar asked Feb 11 '23 22:02

mobaset


1 Answers

If all the values are in the columns you are aggregating over are the same for each group then you can avoid the join by putting them into the group.

Then pass a dictionary of functions to agg. If you set as_index to False to keep the grouped by columns as columns:

df.groupby(['movie id','movie title','release date','IMDb URL','genre'], as_index=False).agg({'user id':len,'rating':'mean'})

Note len is used to count

like image 172
JAB Avatar answered May 22 '23 07:05

JAB