Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr::select() with some variables that may not exist in the data frame?

I have a helper function (say foo()) that will be run on various data frames that may or may not contain specified variables. Suppose I have

library(dplyr)
d1 <- data_frame(taxon=1,model=2,z=3)
d2 <- data_frame(taxon=2,pss=4,z=3)

The variables I want to select are

vars <- intersect(names(data),c("taxon","model","z"))

that is, I'd like foo(d1) to return the taxon, model, and z columns, while foo(d2) returns just taxon and z.

If foo contains select(data,c(taxon,model,z)) then foo(d2) fails (because d2 doesn't contain model). If I use select(data,-pss) then foo(d1) fails similarly.

I know how to do this if I retreat from the tidyverse (just return data[vars]), but I'm wondering if there's a handy way to do this either (1) with a select() helper of some sort (tidyselect::select_helpers) or (2) with tidyeval (which I still haven't found time to get my head around!)

like image 930
Ben Bolker Avatar asked Jul 26 '18 00:07

Ben Bolker


People also ask

What does dplyr :: select do in R?

The select() function of dplyr package is used to select variable names from the R data frame. Use this function if you wanted to select the data frame variables by index or position.

Does dplyr work with data frame?

All of the dplyr functions take a data frame (or tibble) as the first argument. Rather than forcing the user to either save intermediate objects or nest functions, dplyr provides the %>% operator from magrittr.


Video Answer


3 Answers

Another option is select_if:

d2 %>% select_if(names(.) %in% c('taxon', 'model', 'z'))

# # A tibble: 1 x 2
#   taxon     z
#   <dbl> <dbl>
# 1     2     3

select_if is superseded. Use any_of instead:

d2 %>% select(any_of(c('taxon', 'model', 'z')))
# # A tibble: 1 x 2
#   taxon     z
#   <dbl> <dbl>
# 1     2     3

type ?dplyr::select in R and you will find this:

These helpers select variables from a character vector:

all_of(): Matches variable names in a character vector. All names must be present, otherwise an out-of-bounds error is thrown.

any_of(): Same as all_of(), except that no error is thrown for names that don't exist.

like image 174
mt1022 Avatar answered Oct 29 '22 04:10

mt1022


You can use one_of(), which gives a warning when the column is absent but otherwise selects the correct columns:

d1 %>%
    select(one_of(c("taxon", "model", "z")))
d2 %>%
    select(one_of(c("taxon", "model", "z")))
like image 35
Marius Avatar answered Oct 29 '22 04:10

Marius


Using the builtin anscombe data frame for the example noting that z is not a column in anscombe :

anscombe %>% select(intersect(names(.), c("x1", "y1", "z")))

giving:

   x1    y1
1  10  8.04
2   8  6.95
3  13  7.58
4   9  8.81
5  11  8.33
6  14  9.96
7   6  7.24
8   4  4.26
9  12 10.84
10  7  4.82
11  5  5.68
like image 5
G. Grothendieck Avatar answered Oct 29 '22 03:10

G. Grothendieck