Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select_if in dplyr, where the logical condition is negated

I want to select all numeric columns from a dataframe, and then to select all the non-numeric columns. An obvious way to do this is the following :-

mtcars %>%
    select_if(is.numeric) %>%
    head()

This works exactly as I expect.

mtcars %>%
    select_if(!is.numeric) %>%
    head()

This doesn't, and produces the error message Error in !is.numeric : invalid argument type

Looking at another way to do the same thing :-

mtcars %>%
    select_if(sapply(., is.numeric)) %>%
    head()

works perfectly, but

mtcars %>%
    select_if(sapply(., !is.numeric)) %>%
    head()

fails with the same error message. (purrr::keep behaves exactly the same way).

In both cases using - to drop the undesired columns fails too, with the same error as above for the is.numeric version, and this error message for the sapply version Error: Can't convert an integer vector to function.

The help page for is.numeric says

is.numeric is an internal generic primitive function: you can write methods to handle specific classes of objects, see InternalMethods. ... Methods for is.numeric should only return true if the base type of the class is double or integer and values can reasonably be regarded as numeric (e.g., arithmetic on them makes sense, and comparison should be done via the base type).

The help page for ! says

Value

For !, a logical or raw vector(for raw x) of the same length as x: names, dims and dimnames are copied from x, and all other attributes (including class) if no coercion is done.

Looking at the useful question Negation ! in a dplyr pipeline %>% I can see some of the reasons why this doesn't work, but neither of the solutions suggested there works.

mtcars %>%
    select_if(not(is.numeric())) %>%
    head()

gives the reasonable error Error in is.numeric() : 0 arguments passed to 'is.numeric' which requires 1.

mtcars %>%
    select_if(not(is.numeric(.))) %>%
    head()

Fails with this error :- Error in tbl_if_vars(.tbl, .predicate, caller_env(), .include_group_vars = TRUE) : length(.p) == length(tibble_vars) is not TRUE.

This behaviour definitely violates the principle of least surprise. It's not of great consequence to me now, but it suggests I am failing to understand some more fundamental point.

Any thoughts?

like image 444
astaines Avatar asked Jul 16 '18 09:07

astaines


3 Answers

Negating a predicate function can be done with the dedicated Negate() or purrr::negate() functions (rather than the ! operator, that negates a vector):

library(dplyr)

mtcars %>% 
  mutate(foo = "bar") %>% 
  select_if(Negate(is.numeric)) %>% 
  head()

#   foo
# 1 bar
# 2 bar
# 3 bar
# 4 bar
# 5 bar
# 6 bar

Or (purrr::negate() (lower-case) has slightly different behavior, see the respective help pages):

library(purrr)
library(dplyr)

mtcars %>% 
  mutate(foo = "bar") %>% 
  select_if(negate(is.numeric)) %>% 
  head()

#   foo
# 1 bar
# 2 bar
# 3 bar
# 4 bar
# 5 bar
# 6 bar
like image 74
Aurèle Avatar answered Oct 03 '22 12:10

Aurèle


you could define your own "is not numeric" function and then use that instead

is_not_num <- function(x) !is.numeric(x)

mtcars %>%
select_if(is_not_num) %>%
head()
like image 24
Daniel Avatar answered Oct 03 '22 12:10

Daniel


mtcars %>%
  select_if(funs(!is.numeric(.))) %>%
  head()

does the same

like image 21
Nicolas2 Avatar answered Oct 03 '22 13:10

Nicolas2