Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: row-wise dplyr::mutate using function that takes a data frame row and returns an integer

Tags:

r

dplyr

row

I am trying to use pipe mutate statement using a custom function. I looked a this somewhat similar SO post but in vain. Say I have a data frame like this (where blob is some variable not related to the specific task but is part of the entire data) :

df <- 
  data.frame(exclude=c('B','B','D'), 
             B=c(1,0,0), 
             C=c(3,4,9), 
             D=c(1,1,0), 
             blob=c('fd', 'fs', 'sa'), 
             stringsAsFactors = F)

I have a function that uses the variable names so select some based on the value in the exclude column and e.g. calculates a sum on the variables not specified in exclude (which is always a single character).

FUN <- function(df){
  sum(df[c('B', 'C', 'D')] [!names(df[c('B', 'C', 'D')]) %in% df['exclude']] )
}

When I gives a single row (row 1) to FUN I get the the expected sum of C and D (those not mentioned by exclude), namely 4:

FUN(df[1,])

How do I do similarly in a pipe with mutate (adding the result to a variable s). These two tries do not work:

df %>% mutate(s=FUN(.))
df %>% group_by(1:n()) %>% mutate(s=FUN(.))

UPDATE This also do not work as intended:

df %>% rowwise(.) %>% mutate(s=FUN(.))

This works of cause but is not within dplyr's mutate (and pipes):

df$s <- sapply(1:nrow(df), function(x) FUN(df[x,]))
like image 629
user3375672 Avatar asked May 30 '17 14:05

user3375672


People also ask

How do I mutate a data frame in R?

To use mutate in R, all you need to do is call the function, specify the dataframe, and specify the name-value pair for the new variable you want to create.

Does dplyr work with data frame?

All of the dplyr functions take a data frame (or tibble) as the first argument.

What does mutate in dplyr do?

mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. New variables overwrite existing variables of the same name.

How do you do row wise operations in R?

Row wise operation in R can be performed using rowwise() function in dplyr package. Dplyr package is provided with rowwise() function with which we will be doing row wise maximum or row wise minimum operations. We will be using iris data to depict the example of rowwise() function.


1 Answers

If you want to use dplyr you can do so using rowwise and your function FUN.

df %>% 
    rowwise %>% 
    do({
        result = as_data_frame(.)
        result$s = FUN(result)
        result
    })

The same can be achieved using group_by instead of rowwise (like you already tried) but with do instead of mutate

df %>% 
    group_by(1:n()) %>% 
    do({
        result = as_data_frame(.)
        result$s = FUN(result)
        result
    })

The reason mutate doesn't work in this case, is that you are passing the whole tibble to it, so it's like calling FUN(df).

A much more efficient way of doing the same thing though is to just make a matrix of columns to be included and then use rowSums.

cols <- c('B', 'C', 'D')
include_mat <- outer(function(x, y) x != y, X = df$exclude, Y = cols)
# or outer(`!=`, X = df$exclude, Y = cols) if it's more readable to you
df$s <- rowSums(df[cols] * include_mat)
like image 146
konvas Avatar answered Oct 21 '22 14:10

konvas