I have a helper function (say <code>foo()</code>) that will be run on various data frames that may or may not contain specified variables. Suppose I have <pre class="prettyprint"><code>library(dplyr) d1 <- data_frame(taxon=1,model=2,z=3) d2 <- data_frame(taxon=2,pss=4,z=3) </code></pre> The variables I want to select are <pre class="prettyprint"><code>vars <- intersect(names(data),c("taxon","model","z")) </code></pre> that is, I'd like <code>foo(d1)</code> to return the <code>taxon</code>, <code>model</code>, and <code>z</code> columns, while <code>foo(d2)</code> returns just <code>taxon</code> and <code>z</code>. If <code>foo</code> contains <code>select(data,c(taxon,model,z))</code> then <code>foo(d2)</code> fails (because <code>d2</code> doesn't contain <code>model</code>). If I use <code>select(data,-pss)</code> then <code>foo(d1)</code> fails similarly. I know how to do this if I retreat from the tidyverse (just return <code>data[vars]</code>), but I'm wondering if there's a handy way to do this either (1) with a <code>select()</code> helper of some sort (<code>tidyselect::select_helpers</code>) or (2) with tidyeval (which I still haven't found time to get my head around!)

Another option is <code>select_if</code>: <pre class="prettyprint"><code>d2 %>% select_if(names(.) %in% c('taxon', 'model', 'z')) # # A tibble: 1 x 2 # taxon z # <dbl> <dbl> # 1 2 3 </code></pre> <hr> <code>select_if</code> is superseded. Use <code>any_of</code> instead: <pre class="prettyprint"><code>d2 %>% select(any_of(c('taxon', 'model', 'z'))) # # A tibble: 1 x 2 # taxon z # <dbl> <dbl> # 1 2 3 </code></pre> type <code>?dplyr::select</code> in R and you will find this: <blockquote> These helpers select variables from a character vector: all_of(): Matches variable names in a character vector. All names must be present, otherwise an out-of-bounds error is thrown. any_of(): Same as all_of(), except that no error is thrown for names that don't exist. </blockquote>

You can use <code>one_of()</code>, which gives a warning when the column is absent but otherwise selects the correct columns: <pre class="prettyprint"><code>d1 %>% select(one_of(c("taxon", "model", "z"))) d2 %>% select(one_of(c("taxon", "model", "z"))) </code></pre>

Using the builtin <code>anscombe</code> data frame for the example noting that <code>z</code> is not a column in <code>anscombe</code> : <pre class="prettyprint"><code>anscombe %>% select(intersect(names(.), c("x1", "y1", "z"))) </code></pre> giving: <pre class="prettyprint"><code> x1 y1 1 10 8.04 2 8 6.95 3 13 7.58 4 9 8.81 5 11 8.33 6 14 9.96 7 6 7.24 8 4 4.26 9 12 10.84 10 7 4.82 11 5 5.68 </code></pre>

dplyr::select() with some variables that may not exist in the data frame?

Tags:

select

r

dplyr

nse

tidyselect

I have a helper function (say foo()) that will be run on various data frames that may or may not contain specified variables. Suppose I have

library(dplyr)
d1 <- data_frame(taxon=1,model=2,z=3)
d2 <- data_frame(taxon=2,pss=4,z=3)

The variables I want to select are

vars <- intersect(names(data),c("taxon","model","z"))

that is, I'd like foo(d1) to return the taxon, model, and z columns, while foo(d2) returns just taxon and z.

If foo contains select(data,c(taxon,model,z)) then foo(d2) fails (because d2 doesn't contain model). If I use select(data,-pss) then foo(d1) fails similarly.

I know how to do this if I retreat from the tidyverse (just return data[vars]), but I'm wondering if there's a handy way to do this either (1) with a select() helper of some sort (tidyselect::select_helpers) or (2) with tidyeval (which I still haven't found time to get my head around!)

930

asked Jul 26 '18 00:07

Ben Bolker

Video Answer

3 Answers

Another option is select_if:

d2 %>% select_if(names(.) %in% c('taxon', 'model', 'z'))

# # A tibble: 1 x 2
#   taxon     z
#   <dbl> <dbl>
# 1     2     3

select_if is superseded. Use any_of instead:

d2 %>% select(any_of(c('taxon', 'model', 'z')))
# # A tibble: 1 x 2
#   taxon     z
#   <dbl> <dbl>
# 1     2     3

type ?dplyr::select in R and you will find this:

These helpers select variables from a character vector:

all_of(): Matches variable names in a character vector. All names must be present, otherwise an out-of-bounds error is thrown.

any_of(): Same as all_of(), except that no error is thrown for names that don't exist.

174

answered Oct 29 '22 04:10

mt1022

You can use one_of(), which gives a warning when the column is absent but otherwise selects the correct columns:

d1 %>%
    select(one_of(c("taxon", "model", "z")))
d2 %>%
    select(one_of(c("taxon", "model", "z")))

answered Oct 29 '22 04:10

Marius

Using the builtin anscombe data frame for the example noting that z is not a column in anscombe :

anscombe %>% select(intersect(names(.), c("x1", "y1", "z")))

giving:

   x1    y1
1  10  8.04
2   8  6.95
3  13  7.58
4   9  8.81
5  11  8.33
6  14  9.96
7   6  7.24
8   4  4.26
9  12 10.84
10  7  4.82
11  5  5.68

answered Oct 29 '22 03:10

G. Grothendieck

Related questions
                            
                                How to sort and filter data.frame in R?
                            
                                Matching up two vectors in R
                            
                                How to escape backslashes in R string
                            
                                How many elements in a vector are greater than x without using a loop
                            
                                ggplot: arranging boxplots of multiple y-variables for each group of a continuous x
                            
                                Parallel computation of multiple imputation by using mice R package
                            
                                Scrape password-protected website in R
                            
                                Is there any command to exit R programming?
                            
                                Shiny: how to create a confirm dialog box
                            
                                Draw geom_tile borders inside squares to prevent overlap
                            
                                Convert a "by" object to a data frame in R
                            
                                How do I draw a violin plot using ggplot2?
                            
                                How to create a facet in ggplot, except with different variables
                            
                                Force apply to return a list
                            
                                How should I split and retain elements using strsplit?
                            
                                Adjusting the width of legend for continuous variable
                            
                                Reading 40 GB csv file into R using bigmemory
                            
                                Does the c command create a row vector or a column vector by default in R
                            
                                Rmarkdown - Run code and display errors in document
                            
                                mutate variable if column name contains a string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With