Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

groupby, count and average in numpy, pandas in python

I have a dataframe that looks like this:

       userId  movieId  rating
0           1       31     2.5
1           1     1029     3.0
2           1     3671     3.0
3           2       10     4.0
4           2       17     5.0
5           3       60     3.0
6           3      110     4.0
7           3      247     3.5
8           4       10     4.0
9           4      112     5.0
10          5        3     4.0
11          5       39     4.0
12          5      104     4.0

I need to get a dataframe which has unique userId, number of ratings by the user and the average rating by the user as shown below:

       userId    count    mean
0           1        3    2.83
1           2        2     4.5
2           3        3     3.5
3           4        2     4.5
4           5        3     4.0

Can someone help?

like image 391
Anand T Avatar asked Apr 17 '17 17:04

Anand T


People also ask

How do you use Groupby and average in pandas?

Apply a function groupby to each row or column of a DataFrame. Groupby one column and return the mean of the remaining columns in each group. Groupby two columns and return the mean of the remaining column. Groupby one column and return the mean of only particular column in the group.

How do you count rows in Groupby pandas?

The most simple method for pandas groupby count is by using the in-built pandas method named size(). It returns a pandas series that possess the total number of row count for each group. The basic working of the size() method is the same as len() method and hence, it is not affected by NaN values in the dataset.

How do you find your average in pandas?

To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.

What does Group_by do in pandas?

What is the GroupBy function? Pandas' GroupBy is a powerful and versatile function in Python. It allows you to split your data into separate groups to perform computations for better analysis.


1 Answers

df1 = df.groupby('userId')['rating'].agg(['count','mean']).reset_index()
print(df1)


   userId  count      mean
0       1      3  2.833333
1       2      2  4.500000
2       3      3  3.500000
3       4      2  4.500000
4       5      3  4.000000
like image 129
Scott Boston Avatar answered Sep 22 '22 06:09

Scott Boston