What's the difference between using select + unlist from dplyr package and using the dollar sign?

Tags:

dplyr

I've been taking an online course in which the instructor always does the following to obtain, say, the column Col1 from a data.frame object Dat:

library(dplyr)
unlist(select(Dat, Col1))

Why not simply run Dat$Col1? I notice a difference in the "presentation" of both results, but is there any other significant divergence between the two forms? Any operation will result in the same product for both?

401

asked Jan 05 '19 20:01

G. Monteiro

1 Answers

(Posting comments as community wiki.)

These are not quite equivalent - unlist(select(.)) keeps (probably unwanted) names.

dd <- data.frame(Col1=c("abc","def"))
str(unlist(select(dd,Col1)))
##  Factor w/ 2 levels "abc","def": 1 2
##  - attr(*, "names")= chr [1:2] "Col11" "Col12"
str(dd$Col1)
##  Factor w/ 2 levels "abc","def": 1 2

Your instructor is probably just a fan of the tidyverse (@RichScriven); pull(Dat, Col1) or (for extreme "tidiness") Dat %>% pull(Col1) would be more idiomatic (@Henrik). Dat$Col1 or Dat[["Col1"]] would be the base-R equivalents (the former is more convenient for interactive use, the latter is marginally safer for programming purposes since it won't do name-completion).

It hardly matters, but the tidyverse approaches are much slower.

microbenchmark(dd$Col1,dd[["Col1"]],pull(dd,Col1),unlist(select(dd,Col1)))
Unit: microseconds
                     expr     min        lq       mean    median       uq
                  dd$Col1   5.296   10.9630   14.86871   13.4040   17.160
             dd[["Col1"]]   7.870    9.6535   15.18874   11.8270   16.635
           pull(dd, Col1)  44.160  108.7625  128.89342  117.8415  136.890
 unlist(select(dd, Col1)) 601.480 1132.8240 1436.44178 1214.4420 1378.141
      max neval cld
   31.036   100  a 
   88.842   100  a 
  422.462   100  a 
 8796.964   100   b

153

answered Oct 13 '22 00:10

2 revs

Related questions
                            
                                Multiple Imputation of missing and censored data in R
                            
                                How to split kable over multiple columns?
                            
                                RODBC and Microsoft SQL Server: Truncating Long Character Strings
                            
                                Adding custom images to ggplot facets
                            
                                How to retrieve column for row-wise maximum value in an R data.table?
                            
                                R - Extract info after nth occurrence of a character from the right of string
                            
                                ggplot2 density plotting different size of data in R
                            
                                Copy-on-modify semantic on a vector does not append in a loop. Why?
                            
                                How to use R Studio View() function programatically / in a package
                            
                                How to use stan in rmarkdown
                            
                                Creating models and augmenting data without losing additional columns in dplyr/broom
                            
                                Removing the border of legend symbol
                            
                                How to use devtools::use_data on a list of data frames?
                            
                                have internal links into code looking blocks
                            
                                POSIXct object is NA, but is.na() returns FALSE
                            
                                transformation drops documents error in R
                            
                                Download multiple plotly plots to PDF Shiny
                            
                                Add title to layers control box in Leaflet using R
                            
                                NAMESPACE option created by RcppArmadillo.package.skeleton causes error
                            
                                gganimate: include additional variable other than states level variable or frame in title expression

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With