It seems like <code>dplyr::pull()</code> and <code>dplyr::select()</code> do the same thing. Is there a difference besides that <code>dplyr::pull()</code> only selects 1 variable?

First, it makes to see what <code>class</code> each function creates. <pre class="prettyprint"><code>library(dplyr) mtcars %>% pull(cyl) %>% class() #> 'numeric' mtcars %>% select(cyl) %>% class() #> 'data.frame' </code></pre> So <code>pull()</code> creates a vector -- which, in this case, is <code>numeric</code> -- whereas <code>select()</code> creates a data frame. Basically, <code>pull()</code> is the equivalent to writing <code>mtcars$cyl</code> or <code>mtcars[, "cyl"]</code>, whereas <code>select()</code> removes all of the columns except for <code>cyl</code> but maintains the data frame structure

You could see <code>select</code> as an analogue of <code>[</code> or <code>magrittr::extract</code> and <code>pull</code> as an analogue of <code>[[</code> (or <code>$</code>) or <code>magrittr::extract2</code> for data frames (an analogue of <code>[[</code> for lists would be <code>purr::pluck</code>). <pre class="prettyprint"><code>df <- iris %>% head </code></pre> All of these give the same output: <pre class="prettyprint"><code>df %>% pull(Sepal.Length) df %>% pull("Sepal.Length") a <- "Sepal.Length"; df %>% pull(!!quo(a)) df %>% extract2("Sepal.Length") df %>% `[[`("Sepal.Length") df[["Sepal.Length"]] # all of them: # [1] 5.1 4.9 4.7 4.6 5.0 5.4 </code></pre> And all of these give the same output: <pre class="prettyprint"><code>df %>% select(Sepal.Length) a <- "Sepal.Length"; df %>% select(!!quo(a)) df %>% select("Sepal.Length") df %>% extract("Sepal.Length") df %>% `[`("Sepal.Length") df["Sepal.Length"] # all of them: # Sepal.Length # 1 5.1 # 2 4.9 # 3 4.7 # 4 4.6 # 5 5.0 # 6 5.4 </code></pre> <code>pull</code> and <code>select</code> can take <code>literal</code>, <code>character</code>, or <code>numeric</code> indices, while the others take <code>character</code> or <code>numeric</code> only One important thing is they differ on how they handle negative indices. For <code>select</code> negative indices mean columns to drop. For <code>pull</code> they mean count from last column. <pre class="prettyprint"><code>df %>% pull(-Sepal.Length) df %>% pull(-1) # [1] setosa setosa setosa setosa setosa setosa # Levels: setosa versicolor virginica </code></pre> Strange result but <code>Sepal.Length</code> is converted to <code>1</code>, and column <code>-1</code> is <code>Species</code> (last column) This feature is not supported by <code>[[</code> and <code>extract2</code> : <pre class="prettyprint"><code>df %>% `[[`(-1) df %>% extract2(-1) df[[-1]] # Error in .subset2(x, i, exact = exact) : # attempt to select more than one element in get1index <real> </code></pre> Negative indices to drop columns are supported by <code>[</code> and <code>extract</code> though. <pre class="prettyprint"><code>df %>% select(-Sepal.Length) df %>% select(-1) df %>% `[`(-1) df[-1] # Sepal.Width Petal.Length Petal.Width Species # 1 3.5 1.4 0.2 setosa # 2 3.0 1.4 0.2 setosa # 3 3.2 1.3 0.2 setosa # 4 3.1 1.5 0.2 setosa # 5 3.6 1.4 0.2 setosa # 6 3.9 1.7 0.4 setosa </code></pre>

Difference between pull and select in dplyr?

2 Answers

First, it makes to see what class each function creates.

library(dplyr)

mtcars %>% pull(cyl) %>% class()
#> 'numeric'

mtcars %>% select(cyl) %>% class()
#> 'data.frame'

So pull() creates a vector -- which, in this case, is numeric -- whereas select() creates a data frame.

Basically, pull() is the equivalent to writing mtcars$cyl or mtcars[, "cyl"], whereas select() removes all of the columns except for cyl but maintains the data frame structure

163

answered Sep 28 '22 01:09

Evan O.

You could see select as an analogue of [ or magrittr::extract and pull as an analogue of [[ (or $) or magrittr::extract2 for data frames (an analogue of [[ for lists would be purr::pluck).

df <- iris %>% head

All of these give the same output:

df %>% pull(Sepal.Length)
df %>% pull("Sepal.Length")
a <- "Sepal.Length"; df %>% pull(!!quo(a))
df %>% extract2("Sepal.Length")
df %>% `[[`("Sepal.Length")
df[["Sepal.Length"]]

# all of them:
# [1] 5.1 4.9 4.7 4.6 5.0 5.4

And all of these give the same output:

df %>% select(Sepal.Length)
a <- "Sepal.Length"; df %>% select(!!quo(a))
df %>% select("Sepal.Length")
df %>% extract("Sepal.Length")
df %>% `[`("Sepal.Length")
df["Sepal.Length"]
# all of them:
#   Sepal.Length
# 1          5.1
# 2          4.9
# 3          4.7
# 4          4.6
# 5          5.0
# 6          5.4

pull and select can take literal, character, or numeric indices, while the others take character or numeric only

One important thing is they differ on how they handle negative indices.

For select negative indices mean columns to drop.

For pull they mean count from last column.

df %>% pull(-Sepal.Length)
df %>% pull(-1)
# [1] setosa setosa setosa setosa setosa setosa
# Levels: setosa versicolor virginica

Strange result but Sepal.Length is converted to 1, and column -1 is Species (last column)

This feature is not supported by [[ and extract2 :

df %>% `[[`(-1)
df %>% extract2(-1)
df[[-1]]
# Error in .subset2(x, i, exact = exact) : 
#   attempt to select more than one element in get1index <real>

Negative indices to drop columns are supported by [ and extract though.

df %>% select(-Sepal.Length)
df %>% select(-1)
df %>% `[`(-1)
df[-1]

#   Sepal.Width Petal.Length Petal.Width Species
# 1         3.5          1.4         0.2  setosa
# 2         3.0          1.4         0.2  setosa
# 3         3.2          1.3         0.2  setosa
# 4         3.1          1.5         0.2  setosa
# 5         3.6          1.4         0.2  setosa
# 6         3.9          1.7         0.4  setosa

answered Sep 28 '22 00:09

Moody_Mudskipper

Related questions
                            
                                logical(0) in if statement
                            
                                Install R Package XML in Debian / Ubuntu
                            
                                Add correct century to dates with year provided as "Year without century", %y
                            
                                gridExtra 2.0.0 change title size
                            
                                Does R leverage SIMD when doing vectorized calculations?
                            
                                Conditional replacement of column name in tibble using dplyr
                            
                                ggplot legend issue w/ geom_point and geom_text
                            
                                Increase size of boxplot names in R
                            
                                ggplot Donut chart
                            
                                Variable Width Bar Plot
                            
                                Subset igraph graph by label
                            
                                How to import .ods?
                            
                                R plots: Is there a way to draw a border, shadow or buffer around text labels?
                            
                                Unexpected section header '\examples' when checking R package
                            
                                R Find time difference in seconds for YYYY-MM-DD HH:MM:SS.MMM format
                            
                                Remove last occurrence of character
                            
                                match everything until parenthesis
                            
                                Shading confidence intervals manually with ggplot2
                            
                                Finding Elements of Lists in R
                            
                                Replace characters in column names gsub

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference between pull and select in dplyr?

Tags:

r

dplyr

Evan O.

People also ask

2 Answers

Evan O.

Moody_Mudskipper

Recent Activity

Donate For Us