I need to perform a basic group_by
/ mutate
operation using an auxiliary grouping variable. For instance:
df <- data.frame(
u = c(0, 0, 1, 0, 1),
v = c(8, 4, 2, 3, 5)
)
df %>%
group_by(tmp = cumsum(u)) %>%
mutate(w = cumprod(v)) %>%
ungroup %>%
select(-tmp)
My problem is that if df
happens to already contain a column named tmp
I will lose it.
Of course I could choose a very exotic name instead of tmp
to reduce the likeliness of a collision (or I could even choose something like strrep("z", max(nchar(names(df))) + 1)
to be sure) but I'd prefer to have a cleaner solution.
In other words, I'm looking for the dplyr
equivalent of this data.table
line:
setDT(df)[, w := cumprod(v), by = cumsum(u)]
We could create a function to take care of this. Assuming that the temporary grouping variable to be created is 'tmp', by concatenating with the column names of the dataset and calling make.unique
, if there is already a 'tmp' column in the dataset, the duplicate one will be renamed as 'tmp.1'. Using the !!
, naming the column with 'tmp.1' (from nm1
) will not affect the 'tmp' already present in the dataset. In case, if there is no 'tmp', column, the grouping column will be named as 'tmp' and later removed with select
f1 <- function(dat, grpCol, Col) {
grpCol <- enquo(grpCol)
Col <- enquo(Col)
changeCol <- "tmp"
nm1 <- tail(make.unique(c(names(dat), changeCol)), 1)
dat %>%
group_by(!! (nm1) := cumsum(!! grpCol)) %>%
mutate(w = cumprod(!!Col)) %>%
ungroup %>%
select(-one_of(nm1))
}
-run the function
f1(df, u, v)
# A tibble: 5 x 3
# u v w
# <dbl> <dbl> <dbl>
#1 0 8.00 8.00
#2 0 4.00 32.0
#3 1.00 2.00 2.00
#4 0 3.00 6.00
#5 1.00 5.00 5.00
f1(df %>% mutate(tmp = 1), u, v) #create a 'tmp' column in dataset
# A tibble: 5 x 4
# u v tmp w
# <dbl> <dbl> <dbl> <dbl>
#1 0 8.00 1.00 8.00
#2 0 4.00 1.00 32.0
#3 1.00 2.00 1.00 2.00
#4 0 3.00 1.00 6.00
#5 1.00 5.00 1.00 5.00
As a followup (comments from @Frank) about passing expressions
expr <- quos(tmp = cumsum(u), w = cumprod(v))
#additional checks outside the function
names(expr)[1] <- if(names(expr)[1] %in% names(df))
strrep(names(expr)[1], 2) else names(expr)[1]
f2 <- function(dat, exprs ){
dat %>%
group_by(!!! exprs[1]) %>%
mutate(!!! exprs[2])
}
f2(df, expr)
# A tibble: 5 x 4
# Groups: tmp [3]
# u v tmp w
# <dbl> <dbl> <dbl> <dbl>
#1 0 8.00 0 8.00
#2 0 4.00 0 32.0
#3 1.00 2.00 1.00 2.00
#4 0 3.00 1.00 6.00
#5 1.00 5.00 2.00 5.00
You could use ave
instead:
df %>% mutate(w = ave(v, cumsum(u), FUN = cumprod))
by
would also work:
df %>%
by(cumsum(.$u), mutate, w = cumprod(v)) %>%
unclass %>%
bind_rows
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With