Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

column name with brackets or other punctuations for dplyr group_by

Tags:

r

dplyr

I have an imported data frame that has column names with various punctuations including parentheses, e.g. BILLNG.STATUS.(COMPLETED./.INCOMPLTE) .

I was trying to use group_by from dplyr to do some summarizing, something like

df <- df %>% group_by(ORDER.NO, BILLNG.STATUS.(COMPLETED./.INCOMPLTE))

which brings the error Error in mutate_impl(.data, dots) : could not find function "BILLNG.STATUS."

Short of changing the column names, is there a way to handle such column names directly in group_by ?

like image 500
Ricky Avatar asked Jan 27 '15 03:01

Ricky


2 Answers

I think you can make this work if you enclose the "illegal" column names in backticks. For example, let's say I start with this data frame (called df):

  BILLING.STATUS.(COMPLETED./.INCOMPLETE) ORDER.VALUE.(USD)
1                                       A        0.01544196
2                                       A        0.95522706
3                                       B        1.13479303
4                                       B        1.22848285

Then I can summarise it like this:

dat %>% group_by(`BILLING.STATUS.(COMPLETED./.INCOMPLETE)`) %>% 
  summarise(count=n(),
            mean = mean(`ORDER.VALUE.(USD)`))

Giving:

  BILLING.STATUS.(COMPLETED./.INCOMPLETE) count      mean
1                                       A     2 0.4853345
2                                       B     2 1.1816379

Backticks also come in handy for referring to or creating variable names with whitespace. You can find a number of questions related to dplyr and backticks on SO, and there's also some discussion of backticks in the help for Quotes.

like image 152
eipi10 Avatar answered Sep 26 '22 01:09

eipi10


I'm just using this not-an-answer as a counter-example or illustration of limitations for the the backtick method. (It was the first strategem I tried. Perhaps it is the fact that two language operations ("(" and "/") are being handled adjacently that makes this fail.)

names(iris)[5] <- "Specie(/)s"
library(dplyr)
by_species <- iris %>% group_by(`Specie(/)s`)
by_species %>% summarise_each(funs(mean(., na.rm = TRUE)))
#Error: cannot modify grouping variable

Tried a variety or other language-oriented efforts with quote, as.name and substitute that also failed. (I wish there were a mechanism to request that this sink to the bottom of the answers.)

like image 38
IRTFM Avatar answered Sep 26 '22 01:09

IRTFM