I am trying to understand the difference between a mean,sum function vs a cumprod function.
When I run a groupby and then the mean, I get the id column and a mean of the values as expected. 
When I run it with cumprod though, there is no groupby column. How do I ensure that I can get the columns I am grouping by

x = [.25,.23,.55,.89,-.90,-.04]
id = ['a', 'a', 'a', 'b', 'b', 'b']
df.groupby('id').mean()
df.groupby('id').cumprod()
df.groupby('id').mean() is shorthand for df.groupby('id').agg('mean').
df.groupby('id').cumprod() is shorthand for df.groupby('id').transform('cumprod').
The key difference here is that the former is a groupby/agg operation, while the latter is a groupby/transform operation.
groupby/agg aggregates each group into a single value. Therefore, the groupby/agg operation can return a Series whose index contains groupby keys (in this case, id values).
groupby/transform operations return a Series with the same number rows as the
original DataFrame, df. (cumprod is short for cumulative product. Since it returns
a running total of the products, there is one value for each row). Since there is a value for each row of the
original DataFrame, naturally the index can not be the groupby keys. It has to remain
being the index of the original DataFrame.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With