I am trying to use summarise and group by from dplyr in R however when I use a variable in place of explicitly calling the summarized column it uses the sum of dist for the entire data set for each row rather then grouping properly. This can easily be seen in the difference between TestBad and TestGood below. I just want to be able to replicate TestGood's results using the GraphVar variable as in TestBad.
require("dplyr")
GraphVar <- "dist"
TestBad <- summarise(group_by_(cars,"speed"),Sum=sum(cars[[GraphVar]],na.rm=TRUE),Count=n())
testGood <- summarise(group_by_(cars,"speed"),Sum=sum(dist,na.rm=TRUE),Count=n())
Thanks!
In February 2020 there are tidyeval tools for this from package rlang. In particular, if using strings you can use the .data
pronoun.
library(dplyr)
GraphVar = "dist"
cars %>%
group_by(.data[["speed"]]) %>%
summarise(Sum = sum(.data[[GraphVar]], na.rm = TRUE),
Count = n() )
While they will be superseded (but not deprecated) in dplyr 1.0.0, the scoped helper *_at()
functions are useful when working with strings.
cars %>%
group_by_at("speed") %>%
summarise_at(.vars = vars(GraphVar),
.funs = list(Sum = ~sum(., na.rm = TRUE),
Count = ~n() ) )
In 2016 you needed the standard evaluation function summarise_()
along with lazyeval::interp()
. This still works in 2020 but has been deprecated.
library(lazyeval)
cars %>%
group_by_("speed") %>%
summarise_(Sum = interp(~sum(var, na.rm = TRUE), var = as.name(GraphVar)),
Count = ~n() )
The latest usage for referring to one or more columns by name seems to be
cars %>% group_by(across("speed")) %>% ...
cars %>% group_by(across(c("speed", "dist"))) %>% ...
See vignette("colwise")
, section Other verbs
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With