I have a data frame that may or may not have some particular columns present. I want to select columns using dplyr
if they do exist and, if not, just ignore that I tried to select them. Here's an example:
# Load libraries library(dplyr) # Create data frame df <- data.frame(year = 2000:2010, foo = 0:10, bar = 10:20) # Pull out some columns df %>% select(year, contains("bar")) # Result # year bar # 1 2000 10 # 2 2001 11 # 3 2002 12 # 4 2003 13 # 5 2004 14 # 6 2005 15 # 7 2006 16 # 8 2007 17 # 9 2008 18 # 10 2009 19 # 11 2010 20 # Try again for non-existent column df %>% select(year, contains("boo")) # Result #data frame with 0 columns and 11 rows
In the latter case, I just want to return a data frame with the column year
since the column boo
doesn't exist. My question is why do I get an empty data frame in the latter case and what is a good way of avoiding this and achieving the desired result?
EDIT: Session info
R version 3.3.3 (2017-03-06) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] dplyr_0.5.0 loaded via a namespace (and not attached): [1] lazyeval_0.2.0 magrittr_1.5 R6_2.2.0 assertthat_0.2.0 DBI_0.6-1 tools_3.3.3 [7] tibble_1.3.0 Rcpp_0.12.10
Deleting a column using dplyr is very easy using the select() function and the - sign. For example, if you want to remove the columns “X” and “Y” you'd do like this: select(Your_Dataframe, -c(X, Y)) .
The most easiest way to drop columns is by using subset() function. In the code below, we are telling R to drop variables x and z. The '-' sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset() function.
In the devel version of dplyr
df %>% select(year, contains("boo")) # year #1 2000 #2 2001 #3 2002 #4 2003 #5 2004 #6 2005 #7 2006 #8 2007 #9 2008 #10 2009 #11 2010
gives the expected output
Otherwise one option would be to use one_of
df %>% select(one_of("year", "boo"))
It returns a warning message if the column is not available
Other option is matches
df %>% select(matches("year|boo"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With