Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I select columns that may or may not exist?

Tags:

select

r

dplyr

I have a data frame that may or may not have some particular columns present. I want to select columns using dplyr if they do exist and, if not, just ignore that I tried to select them. Here's an example:

# Load libraries library(dplyr)  # Create data frame df <- data.frame(year = 2000:2010, foo = 0:10, bar = 10:20)  # Pull out some columns df %>% select(year, contains("bar"))  # Result #    year bar # 1  2000  10 # 2  2001  11 # 3  2002  12 # 4  2003  13 # 5  2004  14 # 6  2005  15 # 7  2006  16 # 8  2007  17 # 9  2008  18 # 10 2009  19 # 11 2010  20  # Try again for non-existent column df %>% select(year, contains("boo"))  # Result #data frame with 0 columns and 11 rows 

In the latter case, I just want to return a data frame with the column year since the column boo doesn't exist. My question is why do I get an empty data frame in the latter case and what is a good way of avoiding this and achieving the desired result?

EDIT: Session info

R version 3.3.3 (2017-03-06) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1  attached base packages: [1] stats     graphics  grDevices utils     datasets  methods   base       other attached packages: [1] dplyr_0.5.0  loaded via a namespace (and not attached): [1] lazyeval_0.2.0   magrittr_1.5     R6_2.2.0         assertthat_0.2.0 DBI_0.6-1        tools_3.3.3      [7] tibble_1.3.0     Rcpp_0.12.10     
like image 740
Lyngbakr Avatar asked May 04 '17 15:05

Lyngbakr


People also ask

How do you delete a column in select?

Deleting a column using dplyr is very easy using the select() function and the - sign. For example, if you want to remove the columns “X” and “Y” you'd do like this: select(Your_Dataframe, -c(X, Y)) .

How do I skip a column in R?

The most easiest way to drop columns is by using subset() function. In the code below, we are telling R to drop variables x and z. The '-' sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset() function.


1 Answers

In the devel version of dplyr

df %>%    select(year, contains("boo")) #     year #1  2000 #2  2001 #3  2002 #4  2003 #5  2004 #6  2005 #7  2006 #8  2007 #9  2008 #10 2009 #11 2010 

gives the expected output

Otherwise one option would be to use one_of

df %>%    select(one_of("year", "boo")) 

It returns a warning message if the column is not available

Other option is matches

df %>%   select(matches("year|boo")) 
like image 145
akrun Avatar answered Sep 22 '22 22:09

akrun