I need to use group by in levels with ddply or aggregate if that's easier. I am not really sure how to do this as I need to use cumsum as my aggregate function. This is what my data looks like:
level1 level2 hour product
A tea 0 7
A tea 1 2
A tea 2 9
A coffee 17 7
A coffee 18 2
A coffee 20 4
B coffee 0 2
B coffee 1 3
B coffee 2 4
B tea 21 3
B tea 22 1
expected output:
A tea 0 7
A tea 1 9
A tea 2 18
A coffee 17 7
A coffee 18 9
A coffee 20 13
B coffee 0 2
B coffee 1 5
B coffee 2 9
B tea 21 3
B tea 22 4
I tried using
ddply(dd,c("level1","level2","hour"),summarise,cumsum(product))
but that doesn't sum up which I think is because the hour column is being used for group by and its being split by that..I think.. I am not sure I completely understand how aggregate works here. Is there any way I could get the required output using aggregate or ddply?
Here's a solution in base R using ave
and within
:
within(mydf, {
cumsumProduct <- ave(product, level1, level2, FUN = cumsum)
})
# level1 level2 hour product cumsumProduct
# 1 A tea 0 7 7
# 2 A tea 1 2 9
# 3 A tea 2 9 18
# 4 A coffee 17 7 7
# 5 A coffee 18 2 9
# 6 A coffee 20 4 13
# 7 B coffee 0 2 2
# 8 B coffee 1 3 5
# 9 B coffee 2 4 9
# 10 B tea 21 3 3
# 11 B tea 22 1 4
Of course, if you wanted to drop the existing product column, you can change the command to the following to overwrite the current "product" column:
within(mydf, {
product <- ave(product, level1, level2, FUN = cumsum)
})
Your current approach doesn't work in part because you've included "hour" as one of your grouping variables. In other words, it is seeing the combination of "A + tea + 0" as different from "A + tea + 1", but from your desired output, you seem to simply want the combination of "A + tea" to be the group.
aggregate
won't work as you expect, because it will condense everything into a data.frame
with the same number of rows as the number of unique combinations of "level1" and "level2", in this case, 4 rows. The aggregated column would be a list
. The values would be correct, but it would be less useful.
Here's aggregate
and its output:
> aggregate(product ~ level1 + level2, mydf, cumsum)
level1 level2 product
1 A coffee 7, 9, 13
2 B coffee 2, 5, 9
3 A tea 7, 9, 18
4 B tea 3, 4
you should use transform
instead of summarise
:
# you should probably order your `level2` first
dd$level2 <- factor(dd$level2, levels=c("tea", "coffee"))
# and transform using level1 and level2 alone, not hour
# if you use hour, the groups will be for each row
ddply(dd, .(level1, level2), transform, product=cumsum(product))
# level1 level2 hour product
# 1 A tea 0 7
# 2 A tea 1 9
# 3 A tea 2 18
# 4 A coffee 17 7
# 5 A coffee 18 9
# 6 A coffee 20 13
# 7 B tea 21 3
# 8 B tea 22 4
# 9 B coffee 0 2
# 10 B coffee 1 5
# 11 B coffee 2 9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With