I need to calculate the number of activity_months for each product in a pandas DataFrame. Here is my data and code so far:
from pandas import DataFrame
from datetime import datetime
data = [
('product_a','08/31/2013')
,('product_b','08/31/2013')
,('product_c','08/31/2013')
,('product_a','09/30/2013')
,('product_b','09/30/2013')
,('product_c','09/30/2013')
,('product_a','10/31/2013')
,('product_b','10/31/2013')
,('product_c','10/31/2013')
]
product_df = DataFrame( data, columns=['prod_desc','activity_month'])
for index, row in product_df.iterrows():
row['activity_month']= datetime.strptime(row['activity_month'],'%m/%d/%Y')
product_df.loc[index, 'activity_month'] = datetime.strftime(row['activity_month'],'%Y-%m-%d')
product_df = product_df.sort(['prod_desc','activity_month'])
product_df['month_num'] = product_df.groupby(['prod_desc']).size()
However, this returns NaNs for month_num.
Here is what I want to get:
prod_desc activity_month month_num
product_a 2014-08-31 1
product_a 2014-09-30 2
product_a 2014-10-31 3
product_b 2014-08-31 1
product_b 2014-09-30 2
product_b 2014-10-31 3
product_c 2014-08-31 1
product_c 2014-09-30 2
product_c 2014-10-31 3
You can use pandas DataFrame. groupby(). count() to group columns and compute the count or size aggregate, this calculates a rows count for each group combination.
Get Number of Rows in DataFrame You can use len(df. index) to find the number of rows in pandas DataFrame, df. index returns RangeIndex(start=0, stop=8, step=1) and use it on len() to get the count.
agg is an alias for aggregate . Use the alias. A passed user-defined-function will be passed a Series for evaluation. The aggregation is for each column.
Groupby preserves the order of rows within each group. When calling apply, add group keys to index to identify pieces. Reduce the dimensionality of the return type if possible, otherwise return a consistent type.
The groupby is the right idea, but the right method is cumcount
:
>>> product_df['month_num'] = product_df.groupby('product_desc').cumcount()
>>> product_df
product_desc activity_month prod_count pct_ch month_num
0 product_a 2014-01-01 53 NaN 0
3 product_a 2014-02-01 52 -0.018868 1
6 product_a 2014-03-01 50 -0.038462 2
1 product_b 2014-01-01 44 NaN 0
4 product_b 2014-02-01 43 -0.022727 1
7 product_b 2014-03-01 41 -0.046512 2
2 product_c 2014-01-01 36 NaN 0
5 product_c 2014-02-01 35 -0.027778 1
8 product_c 2014-03-01 34 -0.028571 2
If your really want it to start with 1 then just do this instead:
>>> product_df['month_num'] = product_df.groupby('product_desc').cumcount() + 1
product_desc activity_month prod_count pct_ch month_num
0 product_a 2014-01-01 53 NaN 1
3 product_a 2014-02-01 52 -0.018868 2
6 product_a 2014-03-01 50 -0.038462 3
1 product_b 2014-01-01 44 NaN 1
4 product_b 2014-02-01 43 -0.022727 2
7 product_b 2014-03-01 41 -0.046512 3
2 product_c 2014-01-01 36 NaN 1
5 product_c 2014-02-01 35 -0.027778 2
8 product_c 2014-03-01 34 -0.028571 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With