I am trying to figure how to efficiently select columns using dplyr::select_if
. The starwars
data set in dplyr 0.70 is a good dataset to use for this:
> starwars
# A tibble: 87 x 13
name height mass hair_color skin_color eye_color birth_year gender homeworld species films vehicles starships
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <list> <list> <list>
1 Luke Skywalker 172 77 blond fair blue 19.0 male Tatooine Human <chr [5]> <chr [2]> <chr [2]>
2 C-3PO 167 75 <NA> gold yellow 112.0 <NA> Tatooine Droid <chr [6]> <chr [0]> <chr [0]>
3 R2-D2 96 32 <NA> white, blue red 33.0 <NA> Naboo Droid <chr [7]> <chr [0]> <chr [0]>
4 Darth Vader 202 136 none white yellow 41.9 male Tatooine Human <chr [4]> <chr [0]> <chr [1]>
5 Leia Organa 150 49 brown light brown 19.0 female Alderaan Human <chr [5]> <chr [1]> <chr [0]>
6 Owen Lars 178 120 brown, grey light blue 52.0 male Tatooine Human <chr [3]> <chr [0]> <chr [0]>
7 Beru Whitesun lars 165 75 brown light blue 47.0 female Tatooine Human <chr [3]> <chr [0]> <chr [0]>
8 R5-D4 97 32 <NA> white, red red NA <NA> Tatooine Droid <chr [1]> <chr [0]> <chr [0]>
9 Biggs Darklighter 183 84 black light brown 24.0 male Tatooine Human <chr [1]> <chr [0]> <chr [1]>
10 Obi-Wan Kenobi 182 77 auburn, white fair blue-gray 57.0 male Stewjon Human <chr [6]> <chr [1]> <chr [5]>
Now say that I would like select columns that are only integers. This works well:
library(dplyr)
starwars %>%
select_if(is.numeric)
But what should I do if I want to select based on multiple criteria. For example maybe I want both numeric and character columns:
starwars %>%
select_if(c(is.numeric, is.character))
Or maybe I want all numeric AND the name
column:
starwars %>%
select_if(name, is.character)
Neither of the two examples above work so I am wondering how I might accomplish what I've outlined here.
For the first example:
starwars %>%
select_if(function(col) {is.numeric(col) | is.character(col)})
This is taken directly from the RDocumentation page.
For the second:
toKeep <- sapply(starwars, is.numeric)
starwars %>%
select("name", names(toKeep)[as.numeric(toKeep) == 1])
I cannot make something prettier up at the moment, but I'm sure there is a better way :)
From version 1.0.0, as mentioned in the news,
select() and rename() use the latest version of the tidyselect interface. Practically, this means that you can now combine selections using Boolean logic (i.e. !, & and |), and use predicate functions (e.g. is.character) to select variables by type (#4680).
### Install development version on GitHub first until CRAN version is available
# install.packages("devtools")
# devtools::install_github("tidyverse/dplyr")
library(dplyr, warn.conflicts = FALSE)
starwars %>%
as_tibble() %>%
glimpse()
#> Rows: 87
#> Columns: 14
#> $ name <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia...
#> $ height <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180...
#> $ mass <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, ...
#> $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown"...
#> $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light"...
#> $ eye_color <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blu...
#> $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57....
#> $ sex <chr> "male", "none", "none", "male", "female", "male", "femal...
#> $ gender <chr> "masculine", "masculine", "masculine", "masculine", "fem...
#> $ homeworld <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan",...
#> $ species <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "H...
#> $ films <list> [<"The Empire Strikes Back", "Revenge of the Sith", "Re...
#> $ vehicles <list> [<"Snowspeeder", "Imperial Speeder Bike">, <>, <>, <>, ...
#> $ starships <list> [<"X-wing", "Imperial shuttle">, <>, <>, "TIE Advanced ...
To select either numeric or character columns:
starwars %>%
select(is.numeric | is.character) %>%
glimpse()
#> Rows: 87
#> Columns: 11
#> $ height <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180...
#> $ mass <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, ...
#> $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57....
#> $ name <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia...
#> $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown"...
#> $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light"...
#> $ eye_color <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blu...
#> $ sex <chr> "male", "none", "none", "male", "female", "male", "femal...
#> $ gender <chr> "masculine", "masculine", "masculine", "masculine", "fem...
#> $ homeworld <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan",...
#> $ species <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "H...
Or select non-list columns
starwars %>%
select(!is.list) %>%
glimpse()
#> Rows: 87
#> Columns: 11
#> $ name <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia...
#> $ height <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180...
#> $ mass <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, ...
#> $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown"...
#> $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light"...
#> $ eye_color <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blu...
#> $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57....
#> $ sex <chr> "male", "none", "none", "male", "female", "male", "femal...
#> $ gender <chr> "masculine", "masculine", "masculine", "masculine", "fem...
#> $ homeworld <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan",...
#> $ species <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "H...
To select name
& character columns
starwars %>%
select(name | is.character) %>%
glimpse()
#> Rows: 87
#> Columns: 8
#> $ name <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia...
#> $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown"...
#> $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light"...
#> $ eye_color <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blu...
#> $ sex <chr> "male", "none", "none", "male", "female", "male", "femal...
#> $ gender <chr> "masculine", "masculine", "masculine", "masculine", "fem...
#> $ homeworld <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan",...
#> $ species <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "H...
Created on 2020-02-17 by the reprex package (v0.3.0)
Elegant tidyverse syntax where ~
stands for anonymous function may be helpful when using select_if
function:
require(tidyverse)
# numeric and character columns
starwars %>% select_if(~ is.numeric(.) | is.character(.))
# all numeric AND the name column
starwars %>% select(name, where(is.numeric))
Predicate functions e.g. is.numeric
inside of select
for some reason is recommended to be wrapped in where()
according to tidyverse creators.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With