I have a data.frame that has 100 variables. I want to get the sum of three variables only using <code>mutate</code> (not <code>summarise</code>). If there is NA in any of the 3 variables, I still want to get the <code>sum</code>. In order to do this using <code>mutate</code>, I replaced all <code>NA</code> values with <code>0</code> using <code>ifelse</code> then I got the <code>sum</code>. <pre class="prettyprint"><code>library(dplyr) df %>% mutate(mod_var1 = ifelse(is.na(var1), 0, var1), mod_var2 = ifelse(is.na(var2), 0, var2), mod_var3 = ifelse(is.na(var3), 0, var3), sum = (mod_var1+mod_var2+mod_var3)) </code></pre> Is there any better (shorter) way to do this? DATA <pre class="prettyprint"><code>df <- read.table(text = c(" var1 var2 var3 4 5 NA 2 NA 3 1 2 4 NA 3 5 3 NA 2 1 1 5"), header =T) </code></pre>

<code>rowwise()</code> is my go-to function. It's like <code>group_by()</code> but it treats each row as an individual group. <pre class="prettyprint"><code>df %>% rowwise() %>% mutate(Sum = sum(c(var1, var2, var3), na.rm = TRUE)) </code></pre>

We can use <code>Reduce</code> with <code>+</code> <pre class="prettyprint"><code>df %>% mutate_each(funs(replace(., is.na(.), 0)), var1:var3) %>% mutate(Sum = Reduce(`+`, .)) # var1 var2 var3 Sum #1 4 5 0 9 #2 2 0 3 5 #3 1 2 4 7 #4 0 3 5 8 #5 3 0 2 5 #6 1 1 5 7 </code></pre> <hr> Or with <code>rowSums</code> <pre class="prettyprint"><code>df %>% mutate(Sum = rowSums(.[names(.)[1:3]], na.rm = TRUE)) # var1 var2 var3 Sum #1 4 5 NA 9 #2 2 NA 3 5 #3 1 2 4 7 #4 NA 3 5 8 #5 3 NA 2 5 #6 1 1 5 7 </code></pre> <h3>Benchmarks</h3> <pre class="prettyprint"><code>set.seed(24) df1 <- as.data.frame(matrix(sample(c(NA, 1:5), 1e6 *3, replace=TRUE), dimnames = list(NULL, paste0("var", 1:3)), ncol=3)) system.time({ df1 %>% rowwise() %>% mutate(Sum = sum(c(var1, var2, var3), na.rm = TRUE)) }) # user system elapsed # 21.50 0.03 21.66 system.time({ df1 %>% mutate(rn = row_number()) %>% gather(var, varNum, var1:var3) %>% group_by(rn) %>% mutate(sum = sum(varNum, na.rm = TRUE)) %>% spread(var, varNum)}) # user system elapsed # 5.96 0.39 6.37 system.time({ replace(df1, is.na(df1), 0) %>% mutate(sum = var1 + var2 + var3) }) # user system elapsed # 0.17 0.01 0.19 system.time({ df1 %>% mutate_each(funs(replace(., is.na(.), 0)), var1:var3) %>% mutate(Sum = Reduce(`+`, .)) }) # user system elapsed # 0.10 0.02 0.11 system.time({ df1 %>% mutate(Sum = rowSums(.[names(.)[1:3]], na.rm = TRUE)) }) # user system elapsed # 0.04 0.00 0.03 </code></pre>

Where better = <code>tidyr</code>: <pre class="prettyprint"><code>df %>% mutate(rn = row_number()) %>% gather(var, varNum, var1:var3) %>% group_by(rn) %>% mutate(sum = sum(varNum, na.rm = TRUE)) %>% spread(var, varNum) </code></pre> In case your dataset is poised to grow...

dplyr::mutate (assign na.rm =TRUE)

I have a data.frame that has 100 variables. I want to get the sum of three variables only using mutate (not summarise).

If there is NA in any of the 3 variables, I still want to get the sum. In order to do this using mutate, I replaced all NA values with 0 using ifelse then I got the sum.

library(dplyr)
df %>% mutate(mod_var1 = ifelse(is.na(var1), 0, var1),
              mod_var2 = ifelse(is.na(var2), 0, var2),
              mod_var3 = ifelse(is.na(var3), 0, var3),
              sum = (mod_var1+mod_var2+mod_var3))

Is there any better (shorter) way to do this?

DATA

df <- read.table(text = c("
var1    var2    var3
4   5   NA
2   NA  3
1   2   4
NA  3   5
3   NA  2
1   1   5"), header =T)

What does mutate in dplyr do?

mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. New variables overwrite existing variables of the same name.

How do you mutate a new variable in R?

To use mutate in R, all you need to do is call the function, specify the dataframe, and specify the name-value pair for the new variable you want to create.

How do you make a mutate function in R?

In R programming, the mutate function is used to create a new variable from a data set. In order to use the function, we need to install the dplyr package, which is an add-on to R that includes a host of cool functions for selecting, filtering, grouping, and arranging data.

rowwise() is my go-to function. It's like group_by() but it treats each row as an individual group.

df %>% rowwise() %>% mutate(Sum = sum(c(var1, var2, var3), na.rm = TRUE))

We can use Reduce with +

df %>% 
     mutate_each(funs(replace(., is.na(.), 0)), var1:var3) %>% 
     mutate(Sum = Reduce(`+`, .))      
#   var1 var2 var3 Sum
#1    4    5    0   9
#2    2    0    3   5
#3    1    2    4   7
#4    0    3    5   8
#5    3    0    2   5
#6    1    1    5   7

Or with rowSums

df %>% 
   mutate(Sum = rowSums(.[names(.)[1:3]], na.rm = TRUE))
#   var1 var2 var3 Sum
#1    4    5   NA   9
#2    2   NA    3   5
#3    1    2    4   7
#4   NA    3    5   8
#5    3   NA    2   5
#6    1    1    5   7

Benchmarks

set.seed(24)
df1 <- as.data.frame(matrix(sample(c(NA, 1:5), 1e6 *3, replace=TRUE),
                dimnames = list(NULL, paste0("var", 1:3)), ncol=3))
system.time({
df1 %>% rowwise() %>% mutate(Sum = sum(c(var1, var2, var3), na.rm = TRUE))
})
# user  system elapsed 
#  21.50    0.03   21.66 

system.time({
df1 %>%
    mutate(rn = row_number()) %>%
    gather(var, varNum, var1:var3) %>%
    group_by(rn) %>%
    mutate(sum = sum(varNum, na.rm = TRUE)) %>% 
    spread(var, varNum)})
 # user  system elapsed 
 #  5.96    0.39    6.37 


system.time({
replace(df1, is.na(df1), 0) %>% mutate(sum = var1 + var2 + var3)
})

# user  system elapsed 
#   0.17    0.01    0.19 

system.time({
df1 %>% 
     mutate_each(funs(replace(., is.na(.), 0)), var1:var3) %>% 
     mutate(Sum = Reduce(`+`, .))      
})
# user  system elapsed 
#   0.10    0.02    0.11 

system.time({
df1 %>% 
   mutate(Sum = rowSums(.[names(.)[1:3]], na.rm = TRUE))
   })
# user  system elapsed 
#   0.04    0.00    0.03

Where better = tidyr:

df %>%
    mutate(rn = row_number()) %>%
    gather(var, varNum, var1:var3) %>%
    group_by(rn) %>%
    mutate(sum = sum(varNum, na.rm = TRUE)) %>% 
    spread(var, varNum)

In case your dataset is poised to grow...

dplyr::mutate (assign na.rm =TRUE)

Tags:

r

na

dplyr

sum

shiny

People also ask

3 Answers

Phil

Benchmarks

akrun

leerssej

Recent Activity

Donate For Us

dplyr::mutate (assign na.rm =TRUE)

Tags:

r

na

dplyr

sum

shiny

People also ask

3 Answers

Phil

Benchmarks

akrun

leerssej

Related questions

Recent Activity

Donate For Us