Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group by two columns and count the occurrences of each combination in Pandas

I have the following data frame:

data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})  product_id  user_id     p1       a1     p1       a1     p2       a1     p1       a2     p1       a2     p1       a2     p2       a3     p2       a3     p3       a3 

in real case there might be some other columns as well, but what i need to do is to group by data frame by product_id and user_id columns and count number of each combination and add it as a new column in a new dat frame

output should be something like this:

user_id product_id  count a1       p1            2 a1       p2            1 a2       p1            3 a3       p2            2 a3       p3            1 

I have tried the following code:

grouped=data.groupby(['user_id','product_id']).count() 

but the result is:

user_id product_id  a1       p1           p2  a2       p1  a3       p2           p3 

actually the most important thing for me is to have a column names count that has the number of occurrences , i need to use the column later.

like image 760
chessosapiens Avatar asked Aug 13 '16 13:08

chessosapiens


People also ask

How do you count the number of occurrences in a group by?

Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.

How do you group by a column and count in pandas?

Use count() by Column NameUse pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values.

How do you count the number of occurrences in pandas?

How do you Count the Number of Occurrences in a data frame? To count the number of occurrences in e.g. a column in a dataframe you can use Pandas value_counts() method. For example, if you type df['condition']. value_counts() you will get the frequency of each unique value in the column “condition”.

How do I get two column combinations in pandas?

To get all combinations of columns we will be using itertools. product module. This function computes the cartesian product of input iterables. To compute the product of an iterable with itself, we use the optional repeat keyword argument to specify the number of repetitions.


2 Answers

Maybe this is what you want?

>>> data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']}) >>> count_series = data.groupby(['user_id', 'product_id']).size() >>> count_series user_id  product_id a1       p1            2          p2            1 a2       p1            3 a3       p2            2          p3            1 dtype: int64 >>> new_df = count_series.to_frame(name = 'size').reset_index() >>> new_df   user_id product_id  size 0      a1         p1     2 1      a1         p2     1 2      a2         p1     3 3      a3         p2     2 4      a3         p3     1 >>> new_df['size'] 0    2 1    1 2    3 3    2 4    1 Name: size, dtype: int64 
like image 118
Nehal J Wani Avatar answered Sep 27 '22 19:09

Nehal J Wani


In Pandas 1.1.0 you can use the method value_counts with DataFrames:

df.value_counts() 

Output:

product_id  user_id p1          a2         3 p2          a3         2 p1          a1         2 p3          a3         1 p2          a1         1 

If you need a DataFrame:

df.value_counts().to_frame('counts').reset_index() 

Output:

  product_id user_id  counts 0         p1      a2       3 1         p2      a3       2 2         p1      a1       2 3         p3      a3       1 4         p2      a1       1 
like image 45
Mykola Zotko Avatar answered Sep 27 '22 19:09

Mykola Zotko