Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does dplyr's select helper function everything() differ from copying?

Tags:

dataframe

r

dplyr

What is the use case for

select(iris, everything())

as opposed to e.g. just copying the data.frame?

like image 725
jaweej Avatar asked May 11 '16 19:05

jaweej


2 Answers

Looking for references to everything in ?select, they have an example use for reordering columns:

# Reorder variables: keep the variable "Species" in the front
select(iris, Species, everything())

In this case the Species column is moved to the first column, all columns are kept, and no columns are duplicated.

Select helpers are used for more than just the select function - for example, in dplyr version 1.0 and greater, you may want to use it in across() to mutate or summarize all columns.

Since this question was asked, the select helpers have been broken out into their own package, tidyselect. The tidyselect page on CRAN has a lengthy list of reverse imports - it's likely that many of the packages importing tidyselect have cases where everything() is useful.

like image 140
Gregor Thomas Avatar answered Oct 06 '22 13:10

Gregor Thomas


Another example use case:

# Moves the variable Petal.Length to the end
select(iris, -Petal.Length, everything())

(I saw it here: https://stackoverflow.com/a/30472217/4663008)

Either way, both Gregor's answer and mine are confusing to me - I would have expected Species to be duplicated in Gregor's example or removed in my example. e.g. if you try something more complicated based on the previous two examples, it doesn't work:

> dplyr::select(iris, Petal.Width, -Petal.Length, everything())
    Petal.Width Sepal.Length Sepal.Width Petal.Length    Species
1           0.2          5.1         3.5          1.4     setosa
2           0.2          4.9         3.0          1.4     setosa
3           0.2          4.7         3.2          1.3     setosa

Update: After a quick response from hadley on github, I found out that there is a special behaviour using everything() combined with a negative in the first position in select() that will start select() off with all the variables and then everything() draws them back out again. A negative variable in non-first positions do not work as one might expect.

I agree that the negative variable in first position and the everything() select_helper function needs to be better explained in the documentation

Update 2: the documentation for ?select has now been updated to state "Positive values select variables; negative values to drop variables. If the first expression is negative, select() will automatically start with all variables."

like image 35
Arthur Yip Avatar answered Oct 06 '22 13:10

Arthur Yip