When creating summaries of my data using dplyr, I often find myself calculating the CI (using CI from Rmisc):
summary <- data %>%
group_by(group1, group2) %>%
summarize(
var1.mean = CI(var1, ci=0.95)['mean'],
var1.lower = CI(var1, ci=0.95)['lower'],
var1.upper = CI(var1, ci=0.95)['upper'],
var2.mean = CI(var2, ci=0.95)['mean'],
var2.lower = CI(var2, ci=0.95)['lower'],
var3.upper = CI(var2, ci=0.95)['upper'],
var3.mean = CI(var3, ci=0.95)['mean'],
var3.lower = CI(var3, ci=0.95)['lower'],
var3.upper = CI(var3, ci=0.95)['upper'],
var4 = sum(var4)
)
This is both painfully verbose and inefficient. Ultimately, I wish I could just write something a bit like:
summary <- data %>%
group_by(group1, group2) %>%
summarize(
var1 = CI(var1, ci=0.95),
var2 = CI(var2, ci=0.95),
var3 = CI(var3, ci=0.95),
var4 = sum(var4)
)
For the above code, and since CI returns a named column with the rows
"lower","upper" and"mean",I wish I could get a data frame with columns looking like:
"group1","group2","var1.lower","var1.mean","var1.upper","var2.lower","var3.upper","var4".Any idea how this could be achieved? Is there a way to "flatten" columns in R? Something like do.call but applied like rest as in JS or Python?
There might be something to do using quasiquotations, but it starts to go over my R skills..
I used to use this gist with plyr, but it does not work anymore with dplyr, and rather than recoding it again, I wish there is a better way than hacking into the library.
We can use tidyr::unnest if we format the output as a data.frame first
data
library(Rmisc)
library(dplyr)
library(tidyr)
set.seed(1)
data <- data.frame(group1 = sample(c("A","B"),10,T),
group2 = sample(c("A","B"),10,T),
var1 = sample(10),
var2 = sample(10),
var3 = sample(10),
var4 = sample(10))
General solution
data %>% group_by(group1, group2) %>%
dplyr::summarize(var1 = list(data.frame(t(CI(var1, ci=0.95)))),
var2 = list(data.frame(t(CI(var2, ci=0.95)))),
var3 = list(data.frame(t(CI(var3, ci=0.95)))),
var4 = sum(var4)) %>%
unnest (var1,var2,var3,.sep=".")
Result
# A tibble: 4 x 12
# Groups: group1 [2]
# group1 group2 var4 var1.upper var1.mean var1.lower var2.upper var2.mean var2.lower var3.upper var3.mean var3.lower
# <fctr> <fctr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 A A 13 56.824819 6.0 -44.824819 11.85310 5.500000 -0.8531024 26.55931 7.500000 -11.559307
# 2 A B 11 38.265512 6.5 -25.265512 50.97172 6.500000 -37.9717166 25.55931 6.500000 -12.559307
# 3 B A 11 12.956686 4.0 -4.956686 13.65205 5.666667 -2.3187188 15.07146 5.666667 -3.738127
# 4 B B 20 8.484138 6.0 3.515862 14.70619 4.666667 -5.3728564 11.31872 3.333333 -4.652052
Or with a custom CI function (same output)
CI2 <- function(x,ci=0.95) list(data.frame(t(CI(x, ci))))
data %>% group_by(group1, group2) %>%
dplyr::summarize(var1 = CI2(var1, ci=0.95),
var2 = CI2(var2, ci=0.95),
var3 = CI2(var3, ci=0.95),
var4 = sum(var4)) %>%
unnest (var1,var2,var3,.sep=".")
Or using a converter function (same output)
can be used with any other functions that returns an array
vec2rowdf <- function(v) list(data.frame(t(v))) # creates a 1 row data.frame from a vector, wrapped in a list
data %>% group_by(group1, group2) %>%
dplyr::summarize(var1 = CI(var1, ci=0.95) %>% vec2rowdf,
var2 = CI(var2, ci=0.95) %>% vec2rowdf,
var3 = CI(var3, ci=0.95) %>% vec2rowdf,
var4 = sum(var4)) %>%
unnest (var1,var2,var3,.sep=".")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With