I have the below synopsis of a df:
movie id movie title release date IMDb URL genre user id rating
0 2 GoldenEye (1995) 1-Jan-95 http://us.imdb.com/M/title-exact?GoldenEye%20(... Action|Adventure|Thriller 5 3
1 2 GoldenEye (1995) 1-Jan-95 http://us.imdb.com/M/title-exact?GoldenEye%20(... Action|Adventure|Thriller 268 2
2 2 GoldenEye (1995) 1-Jan-95 http://us.imdb.com/M/title-exact?GoldenEye%20(... Action|Adventure|Thriller 276 4
3 2 GoldenEye (1995) 1-Jan-95 http://us.imdb.com/M/title-exact?GoldenEye%20(... Action|Adventure|Thriller 217 3
4 2 GoldenEye (1995) 1-Jan-95 http://us.imdb.com/M/title-exact?GoldenEye%20(... Action|Adventure|Thriller 87 4
What i'm looking for is count 'user id' and average 'rating' and keep all other columns intact. So the result will be something like this:
movie id movie title release date IMDb URL genre user id rating
0 2 GoldenEye (1995) 1-Jan-95 http://us.imdb.com/M/title-exact?GoldenEye%20(... Action|Adventure|Thriller 50 3.75
1 3 Four Rooms (1995) 1-Jan-95 http://us.imdb.com/M/title-exact?GoldenEye%20(... Action|Adventure|Thriller 35 2.34
any idea how to do that?
Thanks
If all the values are in the columns you are aggregating over are the same for each group then you can avoid the join by putting them into the group.
Then pass a dictionary of functions to agg
. If you set as_index
to False
to keep the grouped by columns as columns:
df.groupby(['movie id','movie title','release date','IMDb URL','genre'], as_index=False).agg({'user id':len,'rating':'mean'})
Note len
is used to count
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With