Using dplyr summarise in R with dynamic variable

Question

I am trying to use summarise and group by from dplyr in R however when I use a variable in place of explicitly calling the summarized column it uses the sum of dist for the entire data set for each row rather then grouping properly. This can easily be seen in the difference between TestBad and TestGood below. I just want to be able to replicate TestGood's results using the GraphVar variable as in TestBad.

Click to copy

    require("dplyr")
    GraphVar <- "dist"

    TestBad <- summarise(group_by_(cars,"speed"),Sum=sum(cars[[GraphVar]],na.rm=TRUE),Count=n())

    testGood <- summarise(group_by_(cars,"speed"),Sum=sum(dist,na.rm=TRUE),Count=n())

Thanks!

    require("dplyr")
    GraphVar <- "dist"

    TestBad <- summarise(group_by_(cars,"speed"),Sum=sum(cars[[GraphVar]],na.rm=TRUE),Count=n())

    testGood <- summarise(group_by_(cars,"speed"),Sum=sum(dist,na.rm=TRUE),Count=n())

Thanks!

aosmith · Accepted Answer

In February 2020 there are tidyeval tools for this from package rlang. In particular, if using strings you can use the .data pronoun.

library(dplyr)
GraphVar = "dist"
cars %>%
     group_by(.data[["speed"]]) %>%
     summarise(Sum = sum(.data[[GraphVar]], na.rm = TRUE),
               Count = n() )

While they will be superseded (but not deprecated) in dplyr 1.0.0, the scoped helper *_at() functions are useful when working with strings.

cars %>%
     group_by_at("speed") %>%
     summarise_at(.vars = vars(GraphVar), 
                  .funs = list(Sum = ~sum(., na.rm = TRUE),
                               Count = ~n() ) )

In 2016 you needed the standard evaluation function summarise_() along with lazyeval::interp(). This still works in 2020 but has been deprecated.

library(lazyeval)
cars %>%
    group_by_("speed") %>%
    summarise_(Sum = interp(~sum(var, na.rm = TRUE), var = as.name(GraphVar)), 
             Count = ~n() )

James Baye · Answer

The latest usage for referring to one or more columns by name seems to be

cars %>% group_by(across("speed")) %>% ...
cars %>% group_by(across(c("speed", "dist"))) %>% ...

See vignette("colwise"), section Other verbs.

Using dplyr summarise in R with dynamic variable

Tags:

r

dplyr

Urza5589

2 Answers

aosmith

James Baye

Recent Activity

Donate For Us

Using dplyr summarise in R with dynamic variable

Tags:

r

dplyr

Urza5589

2 Answers

aosmith

James Baye

Related questions

Recent Activity

Donate For Us