I have the following data frame:
data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']}) product_id user_id p1 a1 p1 a1 p2 a1 p1 a2 p1 a2 p1 a2 p2 a3 p2 a3 p3 a3
in real case there might be some other columns as well, but what i need to do is to group by data frame by product_id and user_id columns and count number of each combination and add it as a new column in a new dat frame
output should be something like this:
user_id product_id count a1 p1 2 a1 p2 1 a2 p1 3 a3 p2 2 a3 p3 1
I have tried the following code:
grouped=data.groupby(['user_id','product_id']).count()
but the result is:
user_id product_id a1 p1 p2 a2 p1 a3 p2 p3
actually the most important thing for me is to have a column names count that has the number of occurrences , i need to use the column later.
Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.
Use count() by Column NameUse pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values.
How do you Count the Number of Occurrences in a data frame? To count the number of occurrences in e.g. a column in a dataframe you can use Pandas value_counts() method. For example, if you type df['condition']. value_counts() you will get the frequency of each unique value in the column “condition”.
To get all combinations of columns we will be using itertools. product module. This function computes the cartesian product of input iterables. To compute the product of an iterable with itself, we use the optional repeat keyword argument to specify the number of repetitions.
Maybe this is what you want?
>>> data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']}) >>> count_series = data.groupby(['user_id', 'product_id']).size() >>> count_series user_id product_id a1 p1 2 p2 1 a2 p1 3 a3 p2 2 p3 1 dtype: int64 >>> new_df = count_series.to_frame(name = 'size').reset_index() >>> new_df user_id product_id size 0 a1 p1 2 1 a1 p2 1 2 a2 p1 3 3 a3 p2 2 4 a3 p3 1 >>> new_df['size'] 0 2 1 1 2 3 3 2 4 1 Name: size, dtype: int64
In Pandas 1.1.0 you can use the method value_counts
with DataFrames:
df.value_counts()
Output:
product_id user_id p1 a2 3 p2 a3 2 p1 a1 2 p3 a3 1 p2 a1 1
If you need a DataFrame:
df.value_counts().to_frame('counts').reset_index()
Output:
product_id user_id counts 0 p1 a2 3 1 p2 a3 2 2 p1 a1 2 3 p3 a3 1 4 p2 a1 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With