I've been taking an online course in which the instructor always does the following to obtain, say, the column Col1
from a data.frame
object Dat
:
library(dplyr)
unlist(select(Dat, Col1))
Why not simply run Dat$Col1
? I notice a difference in the "presentation" of both results, but is there any other significant divergence between the two forms? Any operation will result in the same product for both?
The dplyr package performs the steps given below quicker and in an easier fashion: By limiting the choices the focus can now be more on data manipulation difficulties. There are uncomplicated “verbs”, functions present for tackling every common data manipulation and the thoughts can be translated into code faster.
When we use dplyr package, we mostly use the infix operator %>% from magrittr, it passes the left-hand side of the operator to the first argument of the right-hand side of the operator. For example, x %>% f (y) converted into f (x, y) 3.
What Are the Tidyverse Packages in R Language? The dplyr package in R is a structure of data manipulation that provides a uniform set of verbs, helping to resolve the most frequent data manipulation hurdles. The dplyr package performs the steps given below quicker and in an easier fashion:
(Posting comments as community wiki.)
These are not quite equivalent - unlist(select(.))
keeps (probably unwanted) names.
dd <- data.frame(Col1=c("abc","def"))
str(unlist(select(dd,Col1)))
## Factor w/ 2 levels "abc","def": 1 2
## - attr(*, "names")= chr [1:2] "Col11" "Col12"
str(dd$Col1)
## Factor w/ 2 levels "abc","def": 1 2
Your instructor is probably just a fan of the tidyverse (@RichScriven); pull(Dat, Col1)
or (for extreme "tidiness") Dat %>% pull(Col1)
would be more idiomatic (@Henrik). Dat$Col1
or Dat[["Col1"]]
would be the base-R equivalents (the former is more convenient for interactive use, the latter is marginally safer for programming purposes since it won't do name-completion).
It hardly matters, but the tidyverse approaches are much slower.
microbenchmark(dd$Col1,dd[["Col1"]],pull(dd,Col1),unlist(select(dd,Col1)))
Unit: microseconds
expr min lq mean median uq
dd$Col1 5.296 10.9630 14.86871 13.4040 17.160
dd[["Col1"]] 7.870 9.6535 15.18874 11.8270 16.635
pull(dd, Col1) 44.160 108.7625 128.89342 117.8415 136.890
unlist(select(dd, Col1)) 601.480 1132.8240 1436.44178 1214.4420 1378.141
max neval cld
31.036 100 a
88.842 100 a
422.462 100 a
8796.964 100 b
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With