Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Output of pandas groupby with cumprod not showing groupby columns

Tags:

python

pandas

I am trying to understand the difference between a mean,sum function vs a cumprod function.
When I run a groupby and then the mean, I get the id column and a mean of the values as expected. enter image description here

When I run it with cumprod though, there is no groupby column. How do I ensure that I can get the columns I am grouping by enter image description here

x = [.25,.23,.55,.89,-.90,-.04]
id = ['a', 'a', 'a', 'b', 'b', 'b']
df.groupby('id').mean()
df.groupby('id').cumprod()
like image 834
jazz_learn Avatar asked Oct 16 '25 19:10

jazz_learn


1 Answers

df.groupby('id').mean() is shorthand for df.groupby('id').agg('mean').

df.groupby('id').cumprod() is shorthand for df.groupby('id').transform('cumprod').

The key difference here is that the former is a groupby/agg operation, while the latter is a groupby/transform operation.

groupby/agg aggregates each group into a single value. Therefore, the groupby/agg operation can return a Series whose index contains groupby keys (in this case, id values).

groupby/transform operations return a Series with the same number rows as the original DataFrame, df. (cumprod is short for cumulative product. Since it returns a running total of the products, there is one value for each row). Since there is a value for each row of the original DataFrame, naturally the index can not be the groupby keys. It has to remain being the index of the original DataFrame.

like image 180
unutbu Avatar answered Oct 18 '25 13:10

unutbu



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!