Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to increment a row count in groupby in DataFrame

Tags:

I need to calculate the number of activity_months for each product in a pandas DataFrame. Here is my data and code so far:

from pandas import DataFrame
from datetime import datetime
data = [
('product_a','08/31/2013')
,('product_b','08/31/2013')
,('product_c','08/31/2013')
,('product_a','09/30/2013')
,('product_b','09/30/2013')
,('product_c','09/30/2013')
,('product_a','10/31/2013')
,('product_b','10/31/2013')
,('product_c','10/31/2013')
]

product_df = DataFrame( data, columns=['prod_desc','activity_month'])

for index, row in product_df.iterrows():
  row['activity_month']= datetime.strptime(row['activity_month'],'%m/%d/%Y')
  product_df.loc[index, 'activity_month'] = datetime.strftime(row['activity_month'],'%Y-%m-%d')

product_df = product_df.sort(['prod_desc','activity_month'])

product_df['month_num'] = product_df.groupby(['prod_desc']).size()

However, this returns NaNs for month_num.

Here is what I want to get:

prod_desc    activity_month   month_num 
product_a       2014-08-31         1 
product_a       2014-09-30         2         
product_a       2014-10-31         3         
product_b       2014-08-31         1 
product_b       2014-09-30         2         
product_b       2014-10-31         3         
product_c       2014-08-31         1 
product_c       2014-09-30         2         
product_c       2014-10-31         3     
like image 212
analyticsPierce Avatar asked May 21 '14 19:05

analyticsPierce


People also ask

How do I count the number of rows in each group of a Groupby object?

You can use pandas DataFrame. groupby(). count() to group columns and compute the count or size aggregate, this calculates a rows count for each group combination.

How do I count rows in pandas DataFrame?

Get Number of Rows in DataFrame You can use len(df. index) to find the number of rows in pandas DataFrame, df. index returns RangeIndex(start=0, stop=8, step=1) and use it on len() to get the count.

What is AGG in Groupby?

agg is an alias for aggregate . Use the alias. A passed user-defined-function will be passed a Series for evaluation. The aggregation is for each column.

Does pandas Groupby keep order?

Groupby preserves the order of rows within each group. When calling apply, add group keys to index to identify pieces. Reduce the dimensionality of the return type if possible, otherwise return a consistent type.


1 Answers

The groupby is the right idea, but the right method is cumcount:

>>> product_df['month_num'] = product_df.groupby('product_desc').cumcount()
>>> product_df

  product_desc activity_month  prod_count    pct_ch  month_num
0    product_a     2014-01-01          53       NaN          0
3    product_a     2014-02-01          52 -0.018868          1
6    product_a     2014-03-01          50 -0.038462          2
1    product_b     2014-01-01          44       NaN          0
4    product_b     2014-02-01          43 -0.022727          1
7    product_b     2014-03-01          41 -0.046512          2
2    product_c     2014-01-01          36       NaN          0
5    product_c     2014-02-01          35 -0.027778          1
8    product_c     2014-03-01          34 -0.028571          2

If your really want it to start with 1 then just do this instead:

>>> product_df['month_num'] = product_df.groupby('product_desc').cumcount() + 1

  product_desc activity_month  prod_count    pct_ch  month_num
0    product_a     2014-01-01          53       NaN          1
3    product_a     2014-02-01          52 -0.018868          2
6    product_a     2014-03-01          50 -0.038462          3
1    product_b     2014-01-01          44       NaN          1
4    product_b     2014-02-01          43 -0.022727          2
7    product_b     2014-03-01          41 -0.046512          3
2    product_c     2014-01-01          36       NaN          1
5    product_c     2014-02-01          35 -0.027778          2
8    product_c     2014-03-01          34 -0.028571          3
like image 190
Karl D. Avatar answered Oct 17 '22 01:10

Karl D.