Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unnesting results of function returning multiple values in summarize

Tags:

r

dplyr

tidyr

The "wanted" result is given by the "do" function below. I thought that I could get the same with some use of unnest, but could not get it to work.

library(dplyr)
library(tidyr)

# Function rr is given
rr = function(x){
  # This should be an expensive and possibly random function
  r = range(x + rnorm(length(x),0.1))
#  setNames(r, c("min", "max")) # fails, expecting single value
#  list(min = r[1], max= r[2]) # fails
  list(r) # Works, but result is in "long" form without min/max
}

# Works, but syntactically awkward
iris %>% group_by(Species) %>%
  do( {
    r = rr(.$Sepal.Width)[[1]]
    data_frame(min = r[1], max = r[2])
  })

# This give the long format, but without column
# names min/max
iris %>% group_by(Species) %>%
  summarize(
    range = rr(Sepal.Length)
  ) %>% unnest(range)
like image 395
Dieter Menne Avatar asked May 22 '16 09:05

Dieter Menne


People also ask

Which Dplyr function is used to reduce multiple values to a single value?

6.9 summarise() The summarise() function will reduce a data frame by summarizing values in one or multiple columns.

What does Unnest function do in R?

Nesting creates a list-column of data frames; unnesting flattens it back out into regular columns. Nesting is implicitly a summarising operation: you get one row for each group defined by the non-nested columns. This is useful in conjunction with other summaries that work with whole datasets, most notably models.


2 Answers

Here's a pretty straight forward alternative using the data.table package

# Function rr is given
rr = function(x) as.list(setNames(range(x + rnorm(length(x), 0.1)), c("min", "max"))) 

library(data.table)
data.table(iris)[, rr(Sepal.Width), by = Species]
#       Species      min      max
# 1:     setosa 1.839845 6.341040
# 2: versicolor 1.063727 5.498810
# 3:  virginica 1.232525 5.402483
like image 145
David Arenburg Avatar answered Nov 11 '22 06:11

David Arenburg


Unnest() will always unlist your nested columns in a "long" format, but you could use spread() to get the desired output if you create a key column.

library(dplyr)
library(tidyr)

iris %>%
  group_by(Species) %>%
  summarize(range = rr(Sepal.Length)) %>% 
  unnest(range) %>% mutate(newcols = rep(c("min", "max"), 3)) %>%
  spread(newcols, range)
#     Species      max      min
#      (fctr)    (dbl)    (dbl)
#1     setosa 7.636698 3.292692
#2 versicolor 9.792319 3.337382
#3  virginica 9.810723 3.367066
like image 42
mtoto Avatar answered Nov 11 '22 08:11

mtoto