Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between using select + unlist from dplyr package and using the dollar sign?

Tags:

r

dplyr

I've been taking an online course in which the instructor always does the following to obtain, say, the column Col1 from a data.frame object Dat:

library(dplyr)
unlist(select(Dat, Col1))

Why not simply run Dat$Col1? I notice a difference in the "presentation" of both results, but is there any other significant divergence between the two forms? Any operation will result in the same product for both?

like image 401
G. Monteiro Avatar asked Jan 05 '19 20:01

G. Monteiro


People also ask

What is the dplyr package?

The dplyr package performs the steps given below quicker and in an easier fashion: By limiting the choices the focus can now be more on data manipulation difficulties. There are uncomplicated “verbs”, functions present for tackling every common data manipulation and the thoughts can be translated into code faster.

How to use %>% operator in dplyr?

When we use dplyr package, we mostly use the infix operator %>% from magrittr, it passes the left-hand side of the operator to the first argument of the right-hand side of the operator. For example, x %>% f (y) converted into f (x, y) 3.

What are the tidyverse packages in R language?

What Are the Tidyverse Packages in R Language? The dplyr package in R is a structure of data manipulation that provides a uniform set of verbs, helping to resolve the most frequent data manipulation hurdles. The dplyr package performs the steps given below quicker and in an easier fashion:


1 Answers

(Posting comments as community wiki.)

These are not quite equivalent - unlist(select(.)) keeps (probably unwanted) names.

dd <- data.frame(Col1=c("abc","def"))
str(unlist(select(dd,Col1)))
##  Factor w/ 2 levels "abc","def": 1 2
##  - attr(*, "names")= chr [1:2] "Col11" "Col12"
str(dd$Col1)
##  Factor w/ 2 levels "abc","def": 1 2

Your instructor is probably just a fan of the tidyverse (@RichScriven); pull(Dat, Col1) or (for extreme "tidiness") Dat %>% pull(Col1) would be more idiomatic (@Henrik). Dat$Col1 or Dat[["Col1"]] would be the base-R equivalents (the former is more convenient for interactive use, the latter is marginally safer for programming purposes since it won't do name-completion).

It hardly matters, but the tidyverse approaches are much slower.

microbenchmark(dd$Col1,dd[["Col1"]],pull(dd,Col1),unlist(select(dd,Col1)))
Unit: microseconds
                     expr     min        lq       mean    median       uq
                  dd$Col1   5.296   10.9630   14.86871   13.4040   17.160
             dd[["Col1"]]   7.870    9.6535   15.18874   11.8270   16.635
           pull(dd, Col1)  44.160  108.7625  128.89342  117.8415  136.890
 unlist(select(dd, Col1)) 601.480 1132.8240 1436.44178 1214.4420 1378.141
      max neval cld
   31.036   100  a 
   88.842   100  a 
  422.462   100  a 
 8796.964   100   b
like image 153
2 revs Avatar answered Oct 13 '22 00:10

2 revs