Recently I stumbled uppon a strange behaviour of <code>dplyr</code> and I would be happy if somebody would provide some insights. Assuming I have a data of which com columns contain some numerical values. In an easy scenario I would like to compute <code>rowSums</code>. Although there are many ways to do it, here are two examples: <pre class="prettyprint"><code>df <- data.frame(matrix(rnorm(20), 10, 2), ids = paste("i", 1:20, sep = ""), stringsAsFactors = FALSE) # works dplyr::select(df, - ids) %>% {rowSums(.)} # does not work # Error: invalid argument to unary operator df %>% dplyr::mutate(blubb = dplyr::select(df, - ids) %>% {rowSums(.)}) # does not work # Error: invalid argument to unary operator df %>% dplyr::mutate(blubb = dplyr::select(., - ids) %>% {rowSums(.)}) # workaround: tmp <- dplyr::select(df, - ids) %>% {rowSums(.)} df %>% dplyr::mutate(blubb = tmp) # works rowSums(dplyr::select(df, - ids)) # does not work # Error: invalid argument to unary operator df %>% dplyr::mutate(blubb = rowSums(dplyr::select(df, - ids))) # workaround tmp <- rowSums(dplyr::select(df, - ids)) df %>% dplyr::mutate(blubb = tmp) </code></pre> First, I don't really understand what is causing the error and second I would like to know how to actually achieve a tidy computation of some (viable) columns in a tidy way. edit The question mutate and rowSums exclude columns , although related, focuses on using <code>rowSums</code> for computation. Here I'm eager to understand why the upper examples do not work. It is not so much about how to solve (see the workarounds) but to understand what happens when the naive approach is applied.

The examples do not work because you are nesting <code>select</code> in <code>mutate</code> and using bare variable names. In this case, <code>select</code> is trying to do something like <pre class="prettyprint"><code>> -df$ids Error in -df$ids : invalid argument to unary operator </code></pre> which fails because you can't negate a character string (i.e. <code>-"i1"</code> or <code>-"i2"</code> makes no sense). Either of the formulations below works: <pre class="prettyprint"><code>df %>% mutate(blubb = rowSums(select_(., "X1", "X2"))) df %>% mutate(blubb = rowSums(select(., -3))) </code></pre> or <pre class="prettyprint"><code>df %>% mutate(blubb = rowSums(select_(., "-ids"))) </code></pre> as suggested by @Haboryme.

<code>select_</code> is deprecated. You can use: <pre class="prettyprint"><code>library(dplyr) df <- data.frame(matrix(rnorm(20), 10, 2), ids = paste("i", 1:20, sep = ""), stringsAsFactors = FALSE) df %>% mutate(blubb = rowSums(select(., .dots = c("X1", "X2")))) # Or more generally: desired_columns <- c("X1", "X2") df %>% mutate(blubb = rowSums(select(., .dots = all_of(desired_columns)))) </code></pre>

Mutating column in `dplyr` using `rowSums`

Tags:

r

dplyr

Recently I stumbled uppon a strange behaviour of dplyr and I would be happy if somebody would provide some insights.

Assuming I have a data of which com columns contain some numerical values. In an easy scenario I would like to compute rowSums. Although there are many ways to do it, here are two examples:

df <- data.frame(matrix(rnorm(20), 10, 2),
                 ids = paste("i", 1:20, sep = ""),
                 stringsAsFactors = FALSE)

# works
dplyr::select(df, - ids) %>% {rowSums(.)}

# does not work
# Error: invalid argument to unary operator
df %>%
  dplyr::mutate(blubb = dplyr::select(df, - ids) %>% {rowSums(.)})

# does not work
# Error: invalid argument to unary operator
df %>%
  dplyr::mutate(blubb = dplyr::select(., - ids) %>% {rowSums(.)})

# workaround:
tmp <- dplyr::select(df, - ids) %>% {rowSums(.)}
df %>%
  dplyr::mutate(blubb = tmp)

# works
rowSums(dplyr::select(df, - ids))

# does not work
# Error: invalid argument to unary operator
df %>%
  dplyr::mutate(blubb = rowSums(dplyr::select(df, - ids)))

# workaround
tmp <- rowSums(dplyr::select(df, - ids))
df %>%
  dplyr::mutate(blubb = tmp)

First, I don't really understand what is causing the error and second I would like to know how to actually achieve a tidy computation of some (viable) columns in a tidy way.

edit

The question mutate and rowSums exclude columns , although related, focuses on using rowSums for computation. Here I'm eager to understand why the upper examples do not work. It is not so much about how to solve (see the workarounds) but to understand what happens when the naive approach is applied.

228

asked Jan 27 '17 13:01

Drey

2 Answers

The examples do not work because you are nesting select in mutate and using bare variable names. In this case, select is trying to do something like

> -df$ids
Error in -df$ids : invalid argument to unary operator

which fails because you can't negate a character string (i.e. -"i1" or -"i2" makes no sense). Either of the formulations below works:

df %>% mutate(blubb = rowSums(select_(., "X1", "X2")))
df %>% mutate(blubb = rowSums(select(., -3)))

df %>% mutate(blubb = rowSums(select_(., "-ids")))

as suggested by @Haboryme.

145

answered Oct 11 '22 10:10

Weihuang Wong

select_ is deprecated. You can use:

library(dplyr)
df <- data.frame(matrix(rnorm(20), 10, 2),
                 ids = paste("i", 1:20, sep = ""),
                 stringsAsFactors = FALSE)
df %>% 
  mutate(blubb = rowSums(select(., .dots = c("X1", "X2"))))

# Or more generally:
desired_columns <- c("X1", "X2")
df %>% 
  mutate(blubb = rowSums(select(., .dots = all_of(desired_columns))))

answered Oct 11 '22 10:10

HBat

Related questions
                            
                                Returning first row of group
                            
                                NaiveBayes in R Cannot Predict - factor(0) Levels:
                            
                                Convert decimal day to HH:MM
                            
                                What can cause a “non-unique matches detected” error in an r merge?
                            
                                Earliest Date for each id in R
                            
                                dplyr - filter by group size
                            
                                How to erase all attributes?
                            
                                outer() equivalent for non-vector lists in R
                            
                                How to create an "inkblot" chart with R?
                            
                                Out of memory when modifying a big R data.frame
                            
                                XPath to extract text after br tags in R
                            
                                How can I determine if try returned an error or not?
                            
                                How to generate all possible combinations of vectors without caring for order?
                            
                                Calculating column means based on values in another column [duplicate]
                            
                                Passing a `data.table` to c++ functions using `Rcpp` and/or `RcppArmadillo`
                            
                                Arrange ggplots together in custom ratios and spacing
                            
                                Elegant way to drop rare factor levels from data frame
                            
                                How do you create a 50 state map (instead of just lower-48)
                            
                                Linear Regression and storing results in data frame [duplicate]
                            
                                How to identify the distribution of the given data using r

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With