It seems like dplyr::pull()
and dplyr::select()
do the same thing. Is there a difference besides that dplyr::pull()
only selects 1 variable?
Description. The function pull selects a column in a data frame and transforms it into a vector. This is useful to use it in combination with magrittr's pipe operator and dplyr's verbs.
dplyr is a package for making tabular data manipulation easier. tidyr enables you to swiftly convert between different data formats.
In addition to tidyr, and dplyr, there are five packages (including stringr and forcats) which are designed to work with specific types of data: lubridate for dates and date-times.
First, it makes to see what class
each function creates.
library(dplyr)
mtcars %>% pull(cyl) %>% class()
#> 'numeric'
mtcars %>% select(cyl) %>% class()
#> 'data.frame'
So pull()
creates a vector -- which, in this case, is numeric
-- whereas select()
creates a data frame.
Basically, pull()
is the equivalent to writing mtcars$cyl
or mtcars[, "cyl"]
, whereas select()
removes all of the columns except for cyl
but maintains the data frame structure
You could see select
as an analogue of [
or magrittr::extract
and pull
as an analogue of [[
(or $
) or magrittr::extract2
for data frames (an analogue of [[
for lists would be purr::pluck
).
df <- iris %>% head
All of these give the same output:
df %>% pull(Sepal.Length)
df %>% pull("Sepal.Length")
a <- "Sepal.Length"; df %>% pull(!!quo(a))
df %>% extract2("Sepal.Length")
df %>% `[[`("Sepal.Length")
df[["Sepal.Length"]]
# all of them:
# [1] 5.1 4.9 4.7 4.6 5.0 5.4
And all of these give the same output:
df %>% select(Sepal.Length)
a <- "Sepal.Length"; df %>% select(!!quo(a))
df %>% select("Sepal.Length")
df %>% extract("Sepal.Length")
df %>% `[`("Sepal.Length")
df["Sepal.Length"]
# all of them:
# Sepal.Length
# 1 5.1
# 2 4.9
# 3 4.7
# 4 4.6
# 5 5.0
# 6 5.4
pull
and select
can take literal
, character
, or numeric
indices, while the others take character
or numeric
only
One important thing is they differ on how they handle negative indices.
For select
negative indices mean columns to drop.
For pull
they mean count from last column.
df %>% pull(-Sepal.Length)
df %>% pull(-1)
# [1] setosa setosa setosa setosa setosa setosa
# Levels: setosa versicolor virginica
Strange result but Sepal.Length
is converted to 1
, and column -1
is Species
(last column)
This feature is not supported by [[
and extract2
:
df %>% `[[`(-1)
df %>% extract2(-1)
df[[-1]]
# Error in .subset2(x, i, exact = exact) :
# attempt to select more than one element in get1index <real>
Negative indices to drop columns are supported by [
and extract
though.
df %>% select(-Sepal.Length)
df %>% select(-1)
df %>% `[`(-1)
df[-1]
# Sepal.Width Petal.Length Petal.Width Species
# 1 3.5 1.4 0.2 setosa
# 2 3.0 1.4 0.2 setosa
# 3 3.2 1.3 0.2 setosa
# 4 3.1 1.5 0.2 setosa
# 5 3.6 1.4 0.2 setosa
# 6 3.9 1.7 0.4 setosa
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With