I often find myself making incorrect choices in variables names when using purrr
.
For example, take the code on the github page of purrr.
library(purrr)
mtcars %>%
split(.$cyl)
in split(.$cyl)
I often make the mistake of using split(cyl)
. This seems to be the most obvious choice as it is consistent with other tidyverse
commands such as select(cyl)
.
My question is why the .$
in front of the variable name.
The .
represents the data object and by using $
it is extracting the column. It can also take in
mtcars %>%
split(.[['cyl']]
With in the mutate/summarise/group_by/select/arrange
etc. we can simply pass the column names, but there it is different as split
is a base R
function and it cannot find the environment of the dataset where the column 'cyl' is unless we extract the column
One option we can do in tidyverse
is to nest
all other variables except 'cyl' i.e.
mtcars %>%
nest(-cyl)
Now, we have a list
column named 'data' which contains all the other columns as a list
of 'data.frame`s
With new versions of dplyr
(0.8.1
tested), there is group_split
as commented by @Moody_Mudskipper
mtcars %>%
group_split(cyl)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With