I am trying to use pipe mutate statement using a custom function. I looked a this somewhat similar SO post but in vain. Say I have a data frame like this (where <code>blob</code> is some variable not related to the specific task but is part of the entire data) : <pre class="prettyprint"><code>df <- data.frame(exclude=c('B','B','D'), B=c(1,0,0), C=c(3,4,9), D=c(1,1,0), blob=c('fd', 'fs', 'sa'), stringsAsFactors = F) </code></pre> I have a function that uses the variable names so select some based on the value in the <code>exclude</code> column and e.g. calculates a sum on the variables not specified in <code>exclude</code> (which is always a single character). <pre class="prettyprint"><code>FUN <- function(df){ sum(df[c('B', 'C', 'D')] [!names(df[c('B', 'C', 'D')]) %in% df['exclude']] ) } </code></pre> When I gives a single row (row 1) to <code>FUN</code> I get the the expected sum of <code>C</code> and <code>D</code> (those not mentioned by <code>exclude</code>), namely 4: <pre class="prettyprint"><code>FUN(df[1,]) </code></pre> How do I do similarly in a pipe with mutate (adding the result to a variable <code>s</code>). These two tries do not work: <pre class="prettyprint"><code>df %>% mutate(s=FUN(.)) df %>% group_by(1:n()) %>% mutate(s=FUN(.)) </code></pre> UPDATE This also do not work as intended: <pre class="prettyprint"><code>df %>% rowwise(.) %>% mutate(s=FUN(.)) </code></pre> This works of cause but is not within dplyr's mutate (and pipes): <pre class="prettyprint"><code>df$s <- sapply(1:nrow(df), function(x) FUN(df[x,])) </code></pre>

If you want to use <code>dplyr</code> you can do so using <code>rowwise</code> and your function <code>FUN</code>. <pre class="prettyprint"><code>df %>% rowwise %>% do({ result = as_data_frame(.) result$s = FUN(result) result }) </code></pre> The same can be achieved using <code>group_by</code> instead of <code>rowwise</code> (like you already tried) but with <code>do</code> instead of <code>mutate</code> <pre class="prettyprint"><code>df %>% group_by(1:n()) %>% do({ result = as_data_frame(.) result$s = FUN(result) result }) </code></pre> The reason <code>mutate</code> doesn't work in this case, is that you are passing the whole tibble to it, so it's like calling <code>FUN(df)</code>. A much more efficient way of doing the same thing though is to just make a matrix of columns to be included and then use <code>rowSums</code>. <pre class="prettyprint"><code>cols <- c('B', 'C', 'D') include_mat <- outer(function(x, y) x != y, X = df$exclude, Y = cols) # or outer(`!=`, X = df$exclude, Y = cols) if it's more readable to you df$s <- rowSums(df[cols] * include_mat) </code></pre>

R: row-wise dplyr::mutate using function that takes a data frame row and returns an integer

Tags:

r

dplyr

row

I am trying to use pipe mutate statement using a custom function. I looked a this somewhat similar SO post but in vain. Say I have a data frame like this (where blob is some variable not related to the specific task but is part of the entire data) :

df <- 
  data.frame(exclude=c('B','B','D'), 
             B=c(1,0,0), 
             C=c(3,4,9), 
             D=c(1,1,0), 
             blob=c('fd', 'fs', 'sa'), 
             stringsAsFactors = F)

I have a function that uses the variable names so select some based on the value in the exclude column and e.g. calculates a sum on the variables not specified in exclude (which is always a single character).

FUN <- function(df){
  sum(df[c('B', 'C', 'D')] [!names(df[c('B', 'C', 'D')]) %in% df['exclude']] )
}

When I gives a single row (row 1) to FUN I get the the expected sum of C and D (those not mentioned by exclude), namely 4:

FUN(df[1,])

How do I do similarly in a pipe with mutate (adding the result to a variable s). These two tries do not work:

df %>% mutate(s=FUN(.))
df %>% group_by(1:n()) %>% mutate(s=FUN(.))

UPDATE This also do not work as intended:

df %>% rowwise(.) %>% mutate(s=FUN(.))

This works of cause but is not within dplyr's mutate (and pipes):

df$s <- sapply(1:nrow(df), function(x) FUN(df[x,]))

629

asked May 30 '17 14:05

user3375672

1 Answers

If you want to use dplyr you can do so using rowwise and your function FUN.

df %>% 
    rowwise %>% 
    do({
        result = as_data_frame(.)
        result$s = FUN(result)
        result
    })

The same can be achieved using group_by instead of rowwise (like you already tried) but with do instead of mutate

df %>% 
    group_by(1:n()) %>% 
    do({
        result = as_data_frame(.)
        result$s = FUN(result)
        result
    })

The reason mutate doesn't work in this case, is that you are passing the whole tibble to it, so it's like calling FUN(df).

A much more efficient way of doing the same thing though is to just make a matrix of columns to be included and then use rowSums.

cols <- c('B', 'C', 'D')
include_mat <- outer(function(x, y) x != y, X = df$exclude, Y = cols)
# or outer(`!=`, X = df$exclude, Y = cols) if it's more readable to you
df$s <- rowSums(df[cols] * include_mat)

146

answered Oct 21 '22 14:10

konvas

Related questions
                            
                                In R, how to use regex [:punct:] in gsub?
                            
                                How to create a variable of rownames?
                            
                                Downloading Live Olympic Medal Data into R
                            
                                Speedup conversion of 2 million rows of date strings to POSIX.ct
                            
                                Saving a graph with ggsave after using ggplot_build and ggplot_gtable
                            
                                Complete.obs of cor() function
                            
                                How do I predict new data's cluster after clustering training data?
                            
                                How to change the last value in each group by reference, in data.table
                            
                                clustering very large dataset in R
                            
                                Error while publishing in R pubs
                            
                                CentOS 6.5: Howto install GTK version 2.8.0?
                            
                                Vectorizing loop over vector elements
                            
                                Create lagged variable in unbalanced panel data in R
                            
                                Checking if a variable is a number in R
                            
                                Shade (fill or color) area under density curve by quantile
                            
                                Using write.xlsx to replace an existing sheet with R package xlsx
                            
                                remove a character from the entire data frame
                            
                                Removing all columns summing to zero with dplyr
                            
                                efficiently locf by groups in a single R data.table
                            
                                Remove entries from string vector containing specific characters in R [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With