I am trying to use pipe mutate statement using a custom function. I looked a this somewhat similar SO post but in vain.
Say I have a data frame like this (where blob
is some variable not related to the specific task but is part of the entire data) :
df <-
data.frame(exclude=c('B','B','D'),
B=c(1,0,0),
C=c(3,4,9),
D=c(1,1,0),
blob=c('fd', 'fs', 'sa'),
stringsAsFactors = F)
I have a function that uses the variable names so select some based on the value in the exclude
column and e.g. calculates a sum on the variables not specified in exclude
(which is always a single character).
FUN <- function(df){
sum(df[c('B', 'C', 'D')] [!names(df[c('B', 'C', 'D')]) %in% df['exclude']] )
}
When I gives a single row (row 1) to FUN
I get the the expected sum of C
and D
(those not mentioned by exclude
), namely 4:
FUN(df[1,])
How do I do similarly in a pipe with mutate (adding the result to a variable s
). These two tries do not work:
df %>% mutate(s=FUN(.))
df %>% group_by(1:n()) %>% mutate(s=FUN(.))
UPDATE This also do not work as intended:
df %>% rowwise(.) %>% mutate(s=FUN(.))
This works of cause but is not within dplyr's mutate (and pipes):
df$s <- sapply(1:nrow(df), function(x) FUN(df[x,]))
To use mutate in R, all you need to do is call the function, specify the dataframe, and specify the name-value pair for the new variable you want to create.
All of the dplyr functions take a data frame (or tibble) as the first argument.
mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. New variables overwrite existing variables of the same name.
Row wise operation in R can be performed using rowwise() function in dplyr package. Dplyr package is provided with rowwise() function with which we will be doing row wise maximum or row wise minimum operations. We will be using iris data to depict the example of rowwise() function.
If you want to use dplyr
you can do so using rowwise
and your function FUN
.
df %>%
rowwise %>%
do({
result = as_data_frame(.)
result$s = FUN(result)
result
})
The same can be achieved using group_by
instead of rowwise
(like you already tried) but with do
instead of mutate
df %>%
group_by(1:n()) %>%
do({
result = as_data_frame(.)
result$s = FUN(result)
result
})
The reason mutate
doesn't work in this case, is that you are passing the whole tibble to it, so it's like calling FUN(df)
.
A much more efficient way of doing the same thing though is to just make a matrix of columns to be included and then use rowSums
.
cols <- c('B', 'C', 'D')
include_mat <- outer(function(x, y) x != y, X = df$exclude, Y = cols)
# or outer(`!=`, X = df$exclude, Y = cols) if it's more readable to you
df$s <- rowSums(df[cols] * include_mat)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With