Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to flatten non atomic function results so that can be assigned as part of a dplyr mutate step?

Tags:

r

dplyr

tidyverse

I have bumped into this use-case before see How to flatten results of function call as part of dplyr::mutate. This time is different because the function I need to call doesn't depend on a set of x values. Here I'm computing the bootstrapped confidence interval using BCa (bias-corrected accelerated interval) but need to call it twice for every class strata because need to read out the lower and higher confidence interval outputs ($bca[4] and $bca[5]).

if(!require(tidyverse)) install.packages("tidyverse", repos = "http://cran.us.r-project.org")
if(!require(boot)) install.packages("boot", repos = "http://cran.us.r-project.org")
if(!require(purrr)) install.packages("purrr", repos = "http://cran.us.r-project.org")

comp <- data.frame(
  class = sample(c("bronze","silver","gold"),1500,replace=TRUE),
  reputation = rnbinom(1500,mu=100,size=1)+1
)

# function to obtain the mean
bootMean <- function(sample, index) {
  return(mean(sample[index]))
} 

# bootstrapping using 3000 replications
B <- 3000
summaryRep <- comp %>%
  group_by(class) %>%
  summarise(mean=mean(reputation),
            ymin=boot.ci(boot(data=reputation, statistic=bootMean, R=B), type=c("bca"))$bca[4],
            ymax=boot.ci(boot(data=reputation, statistic=bootMean, R=B), type=c("bca"))$bca[5])
summaryRep

I have tried the solutions proposed in the post above but they won't work. First because there is no mapping possible and second it would still complain about the dimension of the boot.ci results.

How can I avoid calling the boot functions twice while keeping the dplyr-rified approach i.e. not going to a procedural approach?

like image 833
SkyWalker Avatar asked Dec 25 '19 13:12

SkyWalker


1 Answers

One way would be to keep the output of boot.ci in a list and then extract the corresponding values.

library(boot)
library(dplyr)
set.seed(123)

comp %>%
  group_by(class) %>%
 summarise(mean=mean(reputation),
           model = list(boot.ci(boot(data=reputation, statistic=bootMean, R=B), 
                        type=c("bca"))),
           ymin= model[[1]]$bca[4],
           ymax= model[[1]]$bca[5])

# A tibble: 3 x 5
#  class   mean model     ymin  ymax
#  <fct>  <dbl> <list>   <dbl> <dbl>
#1 bronze  103. <bootci>  93.7  112.
#2 gold    102. <bootci>  93.5  111.
#3 silver  100. <bootci>  92.1  109.
like image 139
Ronak Shah Avatar answered Sep 29 '22 14:09

Ronak Shah