Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between pull and select in dplyr?

Tags:

r

dplyr

It seems like dplyr::pull() and dplyr::select() do the same thing. Is there a difference besides that dplyr::pull() only selects 1 variable?

like image 704
Evan O. Avatar asked Apr 15 '18 17:04

Evan O.


People also ask

What does dplyr pull do?

Description. The function pull selects a column in a data frame and transforms it into a vector. This is useful to use it in combination with magrittr's pipe operator and dplyr's verbs.

What is the difference between Tidyr and dplyr?

dplyr is a package for making tabular data manipulation easier. tidyr enables you to swiftly convert between different data formats.

Does dplyr include Tidyr?

In addition to tidyr, and dplyr, there are five packages (including stringr and forcats) which are designed to work with specific types of data: lubridate for dates and date-times.


2 Answers

First, it makes to see what class each function creates.

library(dplyr)

mtcars %>% pull(cyl) %>% class()
#> 'numeric'

mtcars %>% select(cyl) %>% class()
#> 'data.frame'

So pull() creates a vector -- which, in this case, is numeric -- whereas select() creates a data frame.

Basically, pull() is the equivalent to writing mtcars$cyl or mtcars[, "cyl"], whereas select() removes all of the columns except for cyl but maintains the data frame structure

like image 163
Evan O. Avatar answered Sep 28 '22 01:09

Evan O.


You could see select as an analogue of [ or magrittr::extract and pull as an analogue of [[ (or $) or magrittr::extract2 for data frames (an analogue of [[ for lists would be purr::pluck).

df <- iris %>% head

All of these give the same output:

df %>% pull(Sepal.Length)
df %>% pull("Sepal.Length")
a <- "Sepal.Length"; df %>% pull(!!quo(a))
df %>% extract2("Sepal.Length")
df %>% `[[`("Sepal.Length")
df[["Sepal.Length"]]

# all of them:
# [1] 5.1 4.9 4.7 4.6 5.0 5.4

And all of these give the same output:

df %>% select(Sepal.Length)
a <- "Sepal.Length"; df %>% select(!!quo(a))
df %>% select("Sepal.Length")
df %>% extract("Sepal.Length")
df %>% `[`("Sepal.Length")
df["Sepal.Length"]
# all of them:
#   Sepal.Length
# 1          5.1
# 2          4.9
# 3          4.7
# 4          4.6
# 5          5.0
# 6          5.4

pull and select can take literal, character, or numeric indices, while the others take character or numeric only

One important thing is they differ on how they handle negative indices.

For select negative indices mean columns to drop.

For pull they mean count from last column.

df %>% pull(-Sepal.Length)
df %>% pull(-1)
# [1] setosa setosa setosa setosa setosa setosa
# Levels: setosa versicolor virginica

Strange result but Sepal.Length is converted to 1, and column -1 is Species (last column)

This feature is not supported by [[ and extract2 :

df %>% `[[`(-1)
df %>% extract2(-1)
df[[-1]]
# Error in .subset2(x, i, exact = exact) : 
#   attempt to select more than one element in get1index <real>

Negative indices to drop columns are supported by [ and extract though.

df %>% select(-Sepal.Length)
df %>% select(-1)
df %>% `[`(-1)
df[-1]

#   Sepal.Width Petal.Length Petal.Width Species
# 1         3.5          1.4         0.2  setosa
# 2         3.0          1.4         0.2  setosa
# 3         3.2          1.3         0.2  setosa
# 4         3.1          1.5         0.2  setosa
# 5         3.6          1.4         0.2  setosa
# 6         3.9          1.7         0.4  setosa
like image 32
Moody_Mudskipper Avatar answered Sep 28 '22 00:09

Moody_Mudskipper