Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select columns based on multiple attribute conditions

Tags:

r

dplyr

I am trying to figure how to efficiently select columns using dplyr::select_if. The starwars data set in dplyr 0.70 is a good dataset to use for this:

> starwars
# A tibble: 87 x 13
                 name height  mass    hair_color  skin_color eye_color birth_year gender homeworld species     films  vehicles starships
                <chr>  <int> <dbl>         <chr>       <chr>     <chr>      <dbl>  <chr>     <chr>   <chr>    <list>    <list>    <list>
 1     Luke Skywalker    172    77         blond        fair      blue       19.0   male  Tatooine   Human <chr [5]> <chr [2]> <chr [2]>
 2              C-3PO    167    75          <NA>        gold    yellow      112.0   <NA>  Tatooine   Droid <chr [6]> <chr [0]> <chr [0]>
 3              R2-D2     96    32          <NA> white, blue       red       33.0   <NA>     Naboo   Droid <chr [7]> <chr [0]> <chr [0]>
 4        Darth Vader    202   136          none       white    yellow       41.9   male  Tatooine   Human <chr [4]> <chr [0]> <chr [1]>
 5        Leia Organa    150    49         brown       light     brown       19.0 female  Alderaan   Human <chr [5]> <chr [1]> <chr [0]>
 6          Owen Lars    178   120   brown, grey       light      blue       52.0   male  Tatooine   Human <chr [3]> <chr [0]> <chr [0]>
 7 Beru Whitesun lars    165    75         brown       light      blue       47.0 female  Tatooine   Human <chr [3]> <chr [0]> <chr [0]>
 8              R5-D4     97    32          <NA>  white, red       red         NA   <NA>  Tatooine   Droid <chr [1]> <chr [0]> <chr [0]>
 9  Biggs Darklighter    183    84         black       light     brown       24.0   male  Tatooine   Human <chr [1]> <chr [0]> <chr [1]>
10     Obi-Wan Kenobi    182    77 auburn, white        fair blue-gray       57.0   male   Stewjon   Human <chr [6]> <chr [1]> <chr [5]>

Now say that I would like select columns that are only integers. This works well:

library(dplyr)

starwars %>%
  select_if(is.numeric)

But what should I do if I want to select based on multiple criteria. For example maybe I want both numeric and character columns:

starwars %>%
  select_if(c(is.numeric, is.character))

Or maybe I want all numeric AND the name column:

starwars %>%
  select_if(name, is.character)

Neither of the two examples above work so I am wondering how I might accomplish what I've outlined here.

like image 263
boshek Avatar asked Jun 15 '17 19:06

boshek


3 Answers

For the first example:

starwars %>%
  select_if(function(col) {is.numeric(col) | is.character(col)})

This is taken directly from the RDocumentation page.

For the second:

toKeep <- sapply(starwars, is.numeric)
starwars %>%
  select("name", names(toKeep)[as.numeric(toKeep) == 1])

I cannot make something prettier up at the moment, but I'm sure there is a better way :)

like image 103
psychOle Avatar answered Nov 04 '22 00:11

psychOle


From version 1.0.0, as mentioned in the news,

select() and rename() use the latest version of the tidyselect interface. Practically, this means that you can now combine selections using Boolean logic (i.e. !, & and |), and use predicate functions (e.g. is.character) to select variables by type (#4680).

### Install development version on GitHub first until CRAN version is available
# install.packages("devtools")
# devtools::install_github("tidyverse/dplyr")
library(dplyr, warn.conflicts = FALSE)

starwars %>% 
  as_tibble() %>% 
  glimpse()
#> Rows: 87
#> Columns: 14
#> $ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia...
#> $ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180...
#> $ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, ...
#> $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown"...
#> $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light"...
#> $ eye_color  <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blu...
#> $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57....
#> $ sex        <chr> "male", "none", "none", "male", "female", "male", "femal...
#> $ gender     <chr> "masculine", "masculine", "masculine", "masculine", "fem...
#> $ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan",...
#> $ species    <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "H...
#> $ films      <list> [<"The Empire Strikes Back", "Revenge of the Sith", "Re...
#> $ vehicles   <list> [<"Snowspeeder", "Imperial Speeder Bike">, <>, <>, <>, ...
#> $ starships  <list> [<"X-wing", "Imperial shuttle">, <>, <>, "TIE Advanced ...

To select either numeric or character columns:

starwars %>%
  select(is.numeric | is.character) %>% 
  glimpse()
#> Rows: 87
#> Columns: 11
#> $ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180...
#> $ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, ...
#> $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57....
#> $ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia...
#> $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown"...
#> $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light"...
#> $ eye_color  <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blu...
#> $ sex        <chr> "male", "none", "none", "male", "female", "male", "femal...
#> $ gender     <chr> "masculine", "masculine", "masculine", "masculine", "fem...
#> $ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan",...
#> $ species    <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "H...

Or select non-list columns

starwars %>%
  select(!is.list) %>% 
  glimpse()
#> Rows: 87
#> Columns: 11
#> $ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia...
#> $ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180...
#> $ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, ...
#> $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown"...
#> $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light"...
#> $ eye_color  <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blu...
#> $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57....
#> $ sex        <chr> "male", "none", "none", "male", "female", "male", "femal...
#> $ gender     <chr> "masculine", "masculine", "masculine", "masculine", "fem...
#> $ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan",...
#> $ species    <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "H...

To select name & character columns

starwars %>%
  select(name | is.character) %>% 
  glimpse()
#> Rows: 87
#> Columns: 8
#> $ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia...
#> $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown"...
#> $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light"...
#> $ eye_color  <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blu...
#> $ sex        <chr> "male", "none", "none", "male", "female", "male", "femal...
#> $ gender     <chr> "masculine", "masculine", "masculine", "masculine", "fem...
#> $ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan",...
#> $ species    <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "H...

Created on 2020-02-17 by the reprex package (v0.3.0)

like image 38
Tung Avatar answered Nov 04 '22 01:11

Tung


Elegant tidyverse syntax where ~ stands for anonymous function may be helpful when using select_if function:

require(tidyverse)

# numeric and character columns
starwars %>% select_if(~ is.numeric(.) | is.character(.)) 

# all numeric AND the name column
starwars %>% select(name, where(is.numeric))

Predicate functions e.g. is.numeric inside of select for some reason is recommended to be wrapped in where() according to tidyverse creators.

like image 39
George Shimanovsky Avatar answered Nov 04 '22 01:11

George Shimanovsky