Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

group_by dplyr is not grouping

Tags:

r

dplyr

Ok, so I've read lots of posts here and I'm kind of embarrassed because I thought I understood the basic dplyrfunctions.

I can't get group_by to form groups and I am perplexed.

I have the data frame test. All I want is to group by the variable ID and then calculate the correlation between two variables per group.

I don't know what's happening because it doesn't group and only outputs 1 correlation when I should have 127 groups and 127 correlations. WHY?

What test looks like:

enter image description here

What I wrote:

library(dplyr)
library(magrittr)
test%>%
  mutate(ID=as.character(ID))%>%
  group_by(ID)%$%
  cor(sulfate,nitrate,use="complete.obs")

What I get: [1] 0.0568084.

like image 218
delcast Avatar asked Jan 02 '23 03:01

delcast


1 Answers

I don't think the exposition pipe %$% will freely provide dplyr semantics with group_by. I haven't looked at the source but just thinking about it, what would expect your code to return? A vector with 127 correlation values? You wouldn't even be able to know which one came from which ID. I suggest that you stick to wrapping operations inside mutate and summarise when possible, which I think is the intended usage. Note that this provides the same advantage of %$% which is avoiding having to specify the data frame context (i.e. can just write mpg instead of mtcars$mpg). I wouldn't use do here, since there is no need (your output is going to be vector and not anything more exotic like a model).

Example using the built in mtcars dataset below.

If you need the vector of correlations, it's easy to extract after this operation.

library(dplyr)

mtcars %>%
  group_by(gear) %>% 
  summarise(cor = cor(mpg, hp))
#> # A tibble: 3 x 2
#>    gear    cor
#>   <dbl>  <dbl>
#> 1     3 -0.739
#> 2     4 -0.879
#> 3     5 -0.900

Created on 2018-07-13 by the reprex package (v0.2.0).

like image 99
Calum You Avatar answered Jan 12 '23 04:01

Calum You