Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

`dplyr::select` without reordering columns

Tags:

r

dplyr

I am looking for an easy, concise way to use dplyr::select without rearranging columns.

Consider this dataset:

library(tidyverse)
head(msleep)
#> # A tibble: 6 × 11
#>   name    genus vore  order conservation sleep_total sleep_rem sleep_cycle awake
#>   <chr>   <chr> <chr> <chr> <chr>              <dbl>     <dbl>       <dbl> <dbl>
#> 1 Cheetah Acin… carni Carn… lc                  12.1      NA        NA      11.9
#> 2 Owl mo… Aotus omni  Prim… <NA>                17         1.8      NA       7  
#> 3 Mounta… Aplo… herbi Rode… nt                  14.4       2.4      NA       9.6
#> 4 Greate… Blar… omni  Sori… lc                  14.9       2.3       0.133   9.1
#> 5 Cow     Bos   herbi Arti… domesticated         4         0.7       0.667  20  
#> 6 Three-… Brad… herbi Pilo… <NA>                14.4       2.2       0.767   9.6
#> # … with 2 more variables: brainwt <dbl>, bodywt <dbl>

If I select vore, genus and name, the resulting dataframe is arranged in the order in which the columns were provided.

msleep %>% select(vore, genus, name)
#> # A tibble: 83 × 3
#>    vore  genus       name                      
#>    <chr> <chr>       <chr>                     
#>  1 carni Acinonyx    Cheetah                   
#>  2 omni  Aotus       Owl monkey                
#>  3 herbi Aplodontia  Mountain beaver           
#>  4 omni  Blarina     Greater short-tailed shrew
#>  5 herbi Bos         Cow                       
#>  6 herbi Bradypus    Three-toed sloth          
#>  7 carni Callorhinus Northern fur seal         
#>  8 <NA>  Calomys     Vesper mouse              
#>  9 carni Canis       Dog                       
#> 10 herbi Capreolus   Roe deer                  
#> # … with 73 more rows

I would instead like to leave them in their default order: name, genus, then vore.

I have a solution (see below), but I do not like it because it is quite wordy, and not completely “tidyverse-esque”. (I am teaching an intro to tidyverse course, and would like something that would not intimidate beginners.)

msleep %>% 
  select(all_of(names(msleep)[names(msleep) %in% c("vore", "genus", "name")]))
#> # A tibble: 83 × 3
#>    name                       genus       vore 
#>    <chr>                      <chr>       <chr>
#>  1 Cheetah                    Acinonyx    carni
#>  2 Owl monkey                 Aotus       omni 
#>  3 Mountain beaver            Aplodontia  herbi
#>  4 Greater short-tailed shrew Blarina     omni 
#>  5 Cow                        Bos         herbi
#>  6 Three-toed sloth           Bradypus    herbi
#>  7 Northern fur seal          Callorhinus carni
#>  8 Vesper mouse               Calomys     <NA> 
#>  9 Dog                        Canis       carni
#> 10 Roe deer                   Capreolus   herbi
#> # … with 73 more rows

Is there such a thing? Thank you!

For context: In reality, we have a data frame with about 400 columns, from which we are selecting ~10-20 at a time to work with. The order of the columns in the original data frame is meaningful, but we don't want to have to labor over listing them in their correct order in the select statements. A very specific need, I'll admit.

Created on 2021-12-22 by the reprex package (v2.0.1)

like image 617
Kene David Nwosu Avatar asked Dec 22 '21 20:12

Kene David Nwosu


3 Answers

We could use match with sort

library(dplyr)
msleep %>%
    select(sort(match(c("vore", "genus", "name"), names(.))))

EDIT: Based on the OP's comments

like image 156
akrun Avatar answered Nov 17 '22 10:11

akrun


Update: In case of providing a vector we could do as akrun suggests in the comments:

nm1 <- c("vore", "genus", "name"); pattern <- str_c(nm1, collapse="|")

Original answer:

You could first define a string with the search items

and then use matches

pattern <- c("vore|genus|name")

select(msleep, matches(pattern))
   name                       genus       vore 
   <chr>                      <chr>       <chr>
 1 Cheetah                    Acinonyx    carni
 2 Owl monkey                 Aotus       omni 
 3 Mountain beaver            Aplodontia  herbi
 4 Greater short-tailed shrew Blarina     omni 
 5 Cow                        Bos         herbi
 6 Three-toed sloth           Bradypus    herbi
 7 Northern fur seal          Callorhinus carni
 8 Vesper mouse               Calomys     NA   
 9 Dog                        Canis       carni
10 Roe deer                   Capreolus   herbi
like image 30
TarJae Avatar answered Nov 17 '22 10:11

TarJae


You can use the power of eval_select() to create a function to select and sort the columns.

library(dplyr)

select_in_order <- function(data, ...) {
  ordered_cols <- sort(tidyselect::eval_select(expr(c(...)), data))
  select(data, ordered_cols)
}

So now this will do what you are asking. The benefit is that it will be "full feature" to what you are used to being able to enter into a select() statement.

# library(ggplot2) # msleep is in ggplot2

msleep %>%
  select_in_order(vore, genus, name)

# this will work as well
msleep %>%
  select_in_order(starts_with("sleep"), vore, name:genus)

EDIT

As another option, simply use relocate() after your select() statement. This alternative approach accomplishes your end goal of keeping the columns in order in a way that is easy to understand by a beginner.

msleep %>%
  select(vore, genus, name) %>%
  relocate(any_of(names(msleep)))
like image 45
Adam Avatar answered Nov 17 '22 11:11

Adam