When interacting with data I find the dplyr library's select() function a great way to organize my data frame columns.
One great use, if I happen to be working with a df that has many columns, I often find myself putting two variables next to each other for easy comparison. When doing this, I then need to attached all other columns either before or after. I found the matches(".")
function a super convenient way to do this.
For example:
library(nycflights13) library(dplyr) # just have the five columns: select(flights, carrier, tailnum, year, month, day) # new order for all column: select(flights, carrier, tailnum, year, month, day, matches(".")) # matches(".") attached all other columns to end of new data frame
The Question - I am curious if there is a better way to do this? Better in the sense of being more flexible.
For example of one issue: Is there some way to include "all other" columns at the beginning or middle of new data.frame? (Note that select(flights, matches("."), year, month, day, )
doesn't produce desired result, since matches(".")
attached all columns and year, month, day
are ignored because they are repeats of existing columns names.)
We can make a new data table by choosing or selecting just the variables that we are interested in. That is what the function select of the dplyr package does. We use the select function to tell R what variables or columns of our data set we want to keep.
To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.
To select a column in R you can use brackets e.g., YourDataFrame['Column'] will take the column named “Column”. Furthermore, we can also use dplyr and the select() function to get columns by name or index. For instance, select(YourDataFrame, c('A', 'B') will take the columns named “A” and “B” from the dataframe.
You can use the mutate() function from the dplyr package to add one or more columns to a data frame in R.
flights %>% relocate(carrier, tailnum, year, month, day)
flights %>% relocate(carrier, tailnum, year, month, day, .after = last_col())
select(flights, carrier, tailnum, year, month, day, everything())
Or in two steps, to select variables provided in a character vector, one_of("x", "y", "z")
:
col <- c("carrier", "tailnum", "year", "month", "day") select(flights, one_of(col), everything())
select(flights, -one_of(col), one_of(col))
If you want to add all the data frame again using
dplyr
:
bind_cols(select(flights, one_of(col)), flights)
bind_cols(flights, select(flights, one_of(col)))
Though not a very elegant solution, it works.
select(flights, carrier, tailnum, one_of(setdiff(colnames(flights),c("carrier","tailnum","year"))),year)
I used setdiff
function to compare. since select
do not accept string arguments, I have used one_of
function. For list of many utility functions for select argument you can refer to this post.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With