Ok, so I've read lots of posts here and I'm kind of embarrassed because I thought I understood the basic dplyr
functions.
I can't get group_by
to form groups and I am perplexed.
I have the data frame test
. All I want is to group by the variable ID
and then calculate the correlation between two variables per group.
I don't know what's happening because it doesn't group and only outputs 1 correlation when I should have 127 groups and 127 correlations. WHY?
What test
looks like:
What I wrote:
library(dplyr)
library(magrittr)
test%>%
mutate(ID=as.character(ID))%>%
group_by(ID)%$%
cor(sulfate,nitrate,use="complete.obs")
What I get: [1] 0.0568084
.
I don't think the exposition pipe %$%
will freely provide dplyr
semantics with group_by
. I haven't looked at the source but just thinking about it, what would expect your code to return? A vector with 127 correlation values? You wouldn't even be able to know which one came from which ID. I suggest that you stick to wrapping operations inside mutate
and summarise
when possible, which I think is the intended usage. Note that this provides the same advantage of %$%
which is avoiding having to specify the data frame context (i.e. can just write mpg
instead of mtcars$mpg
). I wouldn't use do
here, since there is no need (your output is going to be vector and not anything more exotic like a model).
Example using the built in mtcars
dataset below.
If you need the vector of correlations, it's easy to extract after this operation.
library(dplyr)
mtcars %>%
group_by(gear) %>%
summarise(cor = cor(mpg, hp))
#> # A tibble: 3 x 2
#> gear cor
#> <dbl> <dbl>
#> 1 3 -0.739
#> 2 4 -0.879
#> 3 5 -0.900
Created on 2018-07-13 by the reprex package (v0.2.0).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With