Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr::select - Including All Other Columns at End of New Data Frame (or Beginning or Middle)

Tags:

When interacting with data I find the dplyr library's select() function a great way to organize my data frame columns.

One great use, if I happen to be working with a df that has many columns, I often find myself putting two variables next to each other for easy comparison. When doing this, I then need to attached all other columns either before or after. I found the matches(".") function a super convenient way to do this.

For example:

library(nycflights13) library(dplyr)  # just have the five columns: select(flights, carrier, tailnum, year, month, day)   # new order for all column: select(flights, carrier, tailnum, year, month, day, matches("."))  # matches(".")  attached all other columns to end of new data frame 

The Question - I am curious if there is a better way to do this? Better in the sense of being more flexible.

For example of one issue: Is there some way to include "all other" columns at the beginning or middle of new data.frame? (Note that select(flights, matches("."), year, month, day, ) doesn't produce desired result, since matches(".") attached all columns and year, month, day are ignored because they are repeats of existing columns names.)

like image 967
EconomiCurtis Avatar asked Aug 16 '15 22:08

EconomiCurtis


People also ask

What does dplyr :: select do in R?

We can make a new data table by choosing or selecting just the variables that we are interested in. That is what the function select of the dplyr package does. We use the select function to tell R what variables or columns of our data set we want to keep.

How do I select multiple columns from a Dataframe in R?

To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.

How do I select a column in a new Dataframe in R?

To select a column in R you can use brackets e.g., YourDataFrame['Column'] will take the column named “Column”. Furthermore, we can also use dplyr and the select() function to get columns by name or index. For instance, select(YourDataFrame, c('A', 'B') will take the columns named “A” and “B” from the dataframe.

Which dplyr function is used to add new columns based on existing values?

You can use the mutate() function from the dplyr package to add one or more columns to a data frame in R.


2 Answers

Update: using dplyr::relocate()

  • Selected columns **at the beginning**:
  • flights %>%     relocate(carrier, tailnum, year, month, day) 
  • Selected columns **at the end**:
  • flights %>%     relocate(carrier, tailnum, year, month, day, .after = last_col())  

    Old answer

    >If you want to **reorder the columns**
  • All other columns **at the end**:
  • select(flights, carrier, tailnum, year, month, day, everything())  

    Or in two steps, to select variables provided in a character vector, one_of("x", "y", "z"):

    col <- c("carrier", "tailnum", "year", "month", "day") select(flights, one_of(col), everything())  
  • All other columns **at the beginning**:
  • select(flights, -one_of(col), one_of(col)) 

    If you want to add all the data frame again using dplyr:

  • All data frame at the end:
  • bind_cols(select(flights, one_of(col)), flights) 
  • All data frame at the beginning:
  • bind_cols(flights, select(flights, one_of(col))) 
    like image 107
    mpalanco Avatar answered Oct 09 '22 19:10

    mpalanco


    Though not a very elegant solution, it works.

      select(flights, carrier, tailnum,  one_of(setdiff(colnames(flights),c("carrier","tailnum","year"))),year) 

    I used setdiff function to compare. since select do not accept string arguments, I have used one_of function. For list of many utility functions for select argument you can refer to this post.

    like image 44
    Koundy Avatar answered Oct 09 '22 20:10

    Koundy