I am using dplyr and I am wondering whether it is possible to compute differences between groups in one line. As in the small example below, the task is to compute the difference between groups A and Bs standardized "cent" variables.
library(dplyr)
# creating a small data.frame
GROUP <- rep(c("A","B"),each=10)
NUMBE <- rnorm(20,50,10)
datf <- data.frame(GROUP,NUMBE)
datf2 <- datf %.% group_by(GROUP) %.% mutate(cent = (NUMBE - mean(NUMBE))/sd(NUMBE))
gA <- datf2 %.% ungroup() %.% filter(GROUP == "A") %.% select(cent)
gB <- datf2 %.% ungroup() %.% filter(GROUP == "B") %.% select(cent)
gA - gB
This is of course no problem by creating different objects - but is there a more "built in" way of performing this task? Something more like this not working fantasy code below?
datf2 %.% summarize(filter(GROUP == "A",select(cent)) - filter(GROUP == "B",select(cent)))
Thank you!
group_by() takes an existing tbl and converts it into a grouped tbl where operations are performed "by group".
The data frame indexing methods can be used to calculate the difference of rows by group in R. The 'by' attribute is to specify the column to group the data by. All the rows are retained, while a new column is added in the set of columns, using the column to take to compute the difference of rows by the group.
mutate() either changes an existing column or adds a new one. summarise() calculates a single value (per group).
The dplyr package in R Programming Language is a structure of data manipulation that provides a uniform set of verbs, helping to resolve the most frequent data manipulation hurdles.
Given we have 10 of each group, add an index 1:10, 1:10 and summarize over that with difference:
> datf2$entry=c(1:10,1:10)
> datf2 %.% ungroup() %.% group_by(entry) %.% summarize(d=cent[1]-cent[2])
Source: local data frame [10 x 2]
entry d
1 1 -0.8272879
2 2 -0.9159827
3 3 -0.5064762
4 4 0.4211639
5 5 1.3681720
6 6 3.3430289
7 7 1.0086822
8 8 -0.6163907
9 9 -0.7325220
10 10 -2.5423875
compare:
> gA - gB
cent
1 -0.8272879
2 -0.9159827
3 -0.5064762
4 0.4211639
5 1.3681720
6 3.3430289
7 1.0086822
8 -0.6163907
9 -0.7325220
10 -2.5423875
Is there a way to inject the entry
field into the data or the dplyr
call? I'm not sure, it seems to rely on the functions knowing too much about the data...
Thank you for the inspiration. I further developed this solution to that:
mutate(datf2,diffence = filter(datf2, GROUP == "A")$cent - filter(datf2, GROUP == "B")$cent)
This adds the result as column in the the data.frame.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With