I have the following data frame: <pre class="prettyprint"><code>data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']}) product_id user_id p1 a1 p1 a1 p2 a1 p1 a2 p1 a2 p1 a2 p2 a3 p2 a3 p3 a3 </code></pre> in real case there might be some other columns as well, but what i need to do is to group by data frame by product_id and user_id columns and count number of each combination and add it as a new column in a new dat frame output should be something like this: <pre class="prettyprint"><code>user_id product_id count a1 p1 2 a1 p2 1 a2 p1 3 a3 p2 2 a3 p3 1 </code></pre> I have tried the following code: <pre class="prettyprint"><code>grouped=data.groupby(['user_id','product_id']).count() </code></pre> but the result is: <pre class="prettyprint"><code>user_id product_id a1 p1 p2 a2 p1 a3 p2 p3 </code></pre> actually the most important thing for me is to have a column names count that has the number of occurrences , i need to use the column later.

In Pandas 1.1.0 you can use the method <code>value_counts</code> with DataFrames: <pre class="prettyprint"><code>df.value_counts() </code></pre> Output: <pre class="prettyprint"><code>product_id user_id p1 a2 3 p2 a3 2 p1 a1 2 p3 a3 1 p2 a1 1 </code></pre> If you need a DataFrame: <pre class="prettyprint"><code>df.value_counts().to_frame('counts').reset_index() </code></pre> Output: <pre class="prettyprint"><code> product_id user_id counts 0 p1 a2 3 1 p2 a3 2 2 p1 a1 2 3 p3 a3 1 4 p2 a1 1 </code></pre>

Group by two columns and count the occurrences of each combination in Pandas

Tags:

python

pandas

dataframe

data-analysis

I have the following data frame:

data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})  product_id  user_id     p1       a1     p1       a1     p2       a1     p1       a2     p1       a2     p1       a2     p2       a3     p2       a3     p3       a3

in real case there might be some other columns as well, but what i need to do is to group by data frame by product_id and user_id columns and count number of each combination and add it as a new column in a new dat frame

output should be something like this:

user_id product_id  count a1       p1            2 a1       p2            1 a2       p1            3 a3       p2            2 a3       p3            1

I have tried the following code:

grouped=data.groupby(['user_id','product_id']).count()

but the result is:

user_id product_id  a1       p1           p2  a2       p1  a3       p2           p3

actually the most important thing for me is to have a column names count that has the number of occurrences , i need to use the column later.

760

asked Aug 13 '16 13:08

chessosapiens

2 Answers

Maybe this is what you want?

>>> data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']}) >>> count_series = data.groupby(['user_id', 'product_id']).size() >>> count_series user_id  product_id a1       p1            2          p2            1 a2       p1            3 a3       p2            2          p3            1 dtype: int64 >>> new_df = count_series.to_frame(name = 'size').reset_index() >>> new_df   user_id product_id  size 0      a1         p1     2 1      a1         p2     1 2      a2         p1     3 3      a3         p2     2 4      a3         p3     1 >>> new_df['size'] 0    2 1    1 2    3 3    2 4    1 Name: size, dtype: int64

118

answered Sep 27 '22 19:09

Nehal J Wani

In Pandas 1.1.0 you can use the method value_counts with DataFrames:

df.value_counts()

Output:

product_id  user_id p1          a2         3 p2          a3         2 p1          a1         2 p3          a3         1 p2          a1         1

If you need a DataFrame:

df.value_counts().to_frame('counts').reset_index()

Output:

  product_id user_id  counts 0         p1      a2       3 1         p2      a3       2 2         p1      a1       2 3         p3      a3       1 4         p2      a1       1

answered Sep 27 '22 19:09

Mykola Zotko

Related questions
                            
                                Docker NLTK Download
                            
                                Python 3 Multiprocessing queue deadlock when calling join before the queue is empty
                            
                                How do I merge lists in python? [duplicate]
                            
                                Programming in Python vs. programming in Java
                            
                                Compare XML snippets?
                            
                                Longest increasing subsequence
                            
                                SQLAlchemy ordering by count on a many to many relationship
                            
                                Vim and PEP 8 -- Style Guide for Python Code
                            
                                Getting values with the right type in Redis
                            
                                scipy minimize with constraints
                            
                                I know of f-strings, but what are r-strings? Are there others?
                            
                                Swap two rows in a numpy array in python [duplicate]
                            
                                How to get hard disk serial number using Python
                            
                                Override module method where from...import is used
                            
                                Get column name where value is something in pandas dataframe
                            
                                Tkinter messagebox without window?
                            
                                Python best practice in terms of logging
                            
                                Using an OrderedDict in **kwargs
                            
                                OpenCV resize fails on large image with "error: (-215) ssize.area() > 0 in function cv::resize"
                            
                                How to cache Django Rest Framework API calls?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With