Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bootstrapping sample means in R using boot Package, Creating the Statistic Function for boot() Function

I have a data set with 15 density calculations, each from a different transect. I would like to resampled these with replacement, taking 15 randomly selected samples of the 15 transects and then getting the mean of these resamples. Each transect should have its own personal probability of being sampled during this process. This should be done 5000 times. I have a code which does this without using the boot function but if I want to calculate the BCa 95% CI using the boot package it requires the bootstrapping to be done through the boot function first. I have been trying to create a function but I cant get any that seem to work. I want the bootstrap to select from a certain column (data$xs) and the probabilites to be used are in the column data$prob.

The function I thought might work was;

library(boot)
meanfun <- function (data, i){
    d<-data [i,]
    return (mean (d))   }
bo<-boot (data$xs, statistic=meanfun, R=5000)
#boot.ci (bo, conf=0.95, type="bca")  #obviously `bo` was not made

But this told me 'incorrect number of dimensions'

I understand how to make a function in the normal sense but it seems strange how the function works in boot. Since the function is only given to boot by name, and no specification of the arguments to pass into the function I seem limited to what boot itself will pass in as arguments (for example I am unable to pass data$xs in as the argument for data, and I am unable to pass in data$prob as an argument for probability, and so on). It seems to really limit what can be done. Perhaps I am missing something though?

Thanks for any and all help

like image 986
Steve Ahlswede Avatar asked Dec 25 '22 00:12

Steve Ahlswede


2 Answers

The reason for this error is, that data$xs returns a vector, which you then try to subset by data [i, ].

One way to solve this, is by changing it to data[i] or by using data[, "xs", drop = FALSE] instead. The drop = FALSE avoids type coercion, ie. keeps it as a data.frame.

We try

data <- data.frame(xs = rnorm(15, 2))

library(boot)
meanfun <- function(data, i){
  d <- data[i, ]
  return(mean(d))   
}
bo <- boot(data[, "xs", drop = FALSE], statistic=meanfun, R=5000)
boot.ci(bo, conf=0.95, type="bca")

and obtain:

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 5000 bootstrap replicates

CALL : 
boot.ci(boot.out = bo, conf = 0.95, type = "bca")

Intervals : 
Level       BCa          
95%   ( 1.555,  2.534 )  
Calculations and Intervals on Original Scale
like image 99
J.R. Avatar answered Dec 26 '22 13:12

J.R.


One can use boot.array to extract all or a subset of the resampled sets. In this case:

bo.ci<-boot.ci(boot.out = bo, conf = 0.95, type = "bca")


resampled.data<-boot.array(bo,1)

To extract the first and second sets of resampled data:

resample.1<-resampled.data[1,]
resample.2<-resampled.data[2,]

Then proceed to extract the individual statistic you'd want from any subset. For isntance, If you assume normality you could run a student's t.test on teh first subset:

t.test(resample.1)

Which for this example and particular seed value(s) gives:

data: resample.1
t = 6.5216, df = 14, p-value = 1.353e-05
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
5.234781 10.365219
sample estimates:
mean of x
7.8

r resampling boot.array

like image 25
DanGitR Avatar answered Dec 26 '22 12:12

DanGitR