Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Flatten column as argument

Tags:

r

dplyr

When creating summaries of my data using dplyr, I often find myself calculating the CI (using CI from Rmisc):

summary <- data %>%
  group_by(group1, group2) %>%
  summarize(
    var1.mean = CI(var1, ci=0.95)['mean'],
    var1.lower = CI(var1, ci=0.95)['lower'],
    var1.upper = CI(var1, ci=0.95)['upper'],

    var2.mean = CI(var2, ci=0.95)['mean'],
    var2.lower = CI(var2, ci=0.95)['lower'],
    var3.upper = CI(var2, ci=0.95)['upper'],

    var3.mean = CI(var3, ci=0.95)['mean'],
    var3.lower = CI(var3, ci=0.95)['lower'],
    var3.upper = CI(var3, ci=0.95)['upper'],

    var4 = sum(var4)
  )

This is both painfully verbose and inefficient. Ultimately, I wish I could just write something a bit like:

summary <- data %>%
  group_by(group1, group2) %>%
  summarize(
    var1 = CI(var1, ci=0.95),
    var2 = CI(var2, ci=0.95),
    var3 = CI(var3, ci=0.95),
    var4 = sum(var4)
  )

For the above code, and since CI returns a named column with the rows

  • "lower",
  • "upper" and
  • "mean",

I wish I could get a data frame with columns looking like:

  • "group1",
  • "group2",
  • "var1.lower",
  • "var1.mean",
  • "var1.upper",
  • "var2.lower",
  • ...,
  • "var3.upper",
  • "var4".

Any idea how this could be achieved? Is there a way to "flatten" columns in R? Something like do.call but applied like rest as in JS or Python?

There might be something to do using quasiquotations, but it starts to go over my R skills..

I used to use this gist with plyr, but it does not work anymore with dplyr, and rather than recoding it again, I wish there is a better way than hacking into the library.

like image 310
Quentin Roy Avatar asked Dec 03 '25 07:12

Quentin Roy


1 Answers

We can use tidyr::unnest if we format the output as a data.frame first

data

library(Rmisc)
library(dplyr)
library(tidyr)
set.seed(1)
data <- data.frame(group1 = sample(c("A","B"),10,T),
                   group2 = sample(c("A","B"),10,T),
                   var1 = sample(10),
                   var2 = sample(10),
                   var3 = sample(10),
                   var4 = sample(10))

General solution

data %>% group_by(group1, group2) %>%
  dplyr::summarize(var1 = list(data.frame(t(CI(var1, ci=0.95)))),
                   var2 = list(data.frame(t(CI(var2, ci=0.95)))),
                   var3 = list(data.frame(t(CI(var3, ci=0.95)))),
                   var4 = sum(var4)) %>%
  unnest (var1,var2,var3,.sep=".")

Result

# A tibble: 4 x 12
# Groups:   group1 [2]
#   group1 group2  var4 var1.upper var1.mean var1.lower var2.upper var2.mean  var2.lower var3.upper var3.mean var3.lower
#   <fctr> <fctr> <int>      <dbl>     <dbl>      <dbl>      <dbl>     <dbl>       <dbl>      <dbl>     <dbl>      <dbl>
# 1      A      A    13  56.824819       6.0 -44.824819   11.85310  5.500000  -0.8531024   26.55931  7.500000 -11.559307
# 2      A      B    11  38.265512       6.5 -25.265512   50.97172  6.500000 -37.9717166   25.55931  6.500000 -12.559307
# 3      B      A    11  12.956686       4.0  -4.956686   13.65205  5.666667  -2.3187188   15.07146  5.666667  -3.738127
# 4      B      B    20   8.484138       6.0   3.515862   14.70619  4.666667  -5.3728564   11.31872  3.333333  -4.652052

Or with a custom CI function (same output)

CI2 <- function(x,ci=0.95) list(data.frame(t(CI(x, ci))))

data %>% group_by(group1, group2) %>%
  dplyr::summarize(var1 = CI2(var1, ci=0.95),
                   var2 = CI2(var2, ci=0.95),
                   var3 = CI2(var3, ci=0.95),
                   var4 = sum(var4)) %>%
  unnest (var1,var2,var3,.sep=".")

Or using a converter function (same output)

can be used with any other functions that returns an array

vec2rowdf <- function(v) list(data.frame(t(v))) # creates a 1 row data.frame from a vector, wrapped in a list
data %>% group_by(group1, group2) %>%
  dplyr::summarize(var1 = CI(var1, ci=0.95) %>% vec2rowdf,
                   var2 = CI(var2, ci=0.95) %>% vec2rowdf,
                   var3 = CI(var3, ci=0.95) %>% vec2rowdf,
                   var4 = sum(var4)) %>%
  unnest (var1,var2,var3,.sep=".")
like image 112
Moody_Mudskipper Avatar answered Dec 04 '25 20:12

Moody_Mudskipper