I have a helper function (say foo()
) that will be run on various data frames that may or may not contain specified variables. Suppose I have
library(dplyr)
d1 <- data_frame(taxon=1,model=2,z=3)
d2 <- data_frame(taxon=2,pss=4,z=3)
The variables I want to select are
vars <- intersect(names(data),c("taxon","model","z"))
that is, I'd like foo(d1)
to return the taxon
, model
, and z
columns, while foo(d2)
returns just taxon
and z
.
If foo
contains select(data,c(taxon,model,z))
then foo(d2)
fails (because d2
doesn't contain model
). If I use select(data,-pss)
then foo(d1)
fails similarly.
I know how to do this if I retreat from the tidyverse (just return data[vars]
), but I'm wondering if there's a handy way to do this either (1) with a select()
helper of some sort (tidyselect::select_helpers
) or (2) with tidyeval (which I still haven't found time to get my head around!)
The select() function of dplyr package is used to select variable names from the R data frame. Use this function if you wanted to select the data frame variables by index or position.
All of the dplyr functions take a data frame (or tibble) as the first argument. Rather than forcing the user to either save intermediate objects or nest functions, dplyr provides the %>% operator from magrittr.
Another option is select_if
:
d2 %>% select_if(names(.) %in% c('taxon', 'model', 'z'))
# # A tibble: 1 x 2
# taxon z
# <dbl> <dbl>
# 1 2 3
select_if
is superseded. Use any_of
instead:
d2 %>% select(any_of(c('taxon', 'model', 'z')))
# # A tibble: 1 x 2
# taxon z
# <dbl> <dbl>
# 1 2 3
type ?dplyr::select
in R and you will find this:
These helpers select variables from a character vector:
all_of(): Matches variable names in a character vector. All names must be present, otherwise an out-of-bounds error is thrown.
any_of(): Same as all_of(), except that no error is thrown for names that don't exist.
You can use one_of()
, which gives a warning when the column is absent but otherwise selects the correct columns:
d1 %>%
select(one_of(c("taxon", "model", "z")))
d2 %>%
select(one_of(c("taxon", "model", "z")))
Using the builtin anscombe
data frame for the example noting that z
is not a column in anscombe
:
anscombe %>% select(intersect(names(.), c("x1", "y1", "z")))
giving:
x1 y1
1 10 8.04
2 8 6.95
3 13 7.58
4 9 8.81
5 11 8.33
6 14 9.96
7 6 7.24
8 4 4.26
9 12 10.84
10 7 4.82
11 5 5.68
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With