Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R calculate the standard error using bootstrap

I have this array of values:

> df
[1] 2 0 0 2 2 0 0 1 0 1 2 1 0 1 3 0 0 1 1 0 0 0 2 1 2 1 3 1 0 0 0 1 1 2 0 1 3
[38] 1 0 2 1 1 2 2 1 2 2 2 1 1 1 2 1 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 0 0 0 0
[75] 0 0 0 0 0 1 1 0 1 1 1 1 3 1 3 0 1 2 2 1 2 3 1 0 0 1

I want to use package boot to calculate the standard error of the data. http://www.ats.ucla.edu/stat/r/faq/boot.htm

So, I used this command to pursue:

library(boot)
boot(df, mean, R=10)

and I got this error:

Error in mean.default(data, original, ...) : 
'trim' must be numeric of length one

Can someone help me figure out the problem? Thanks

like image 671
Vahid Mirjalili Avatar asked Aug 20 '13 17:08

Vahid Mirjalili


2 Answers

If you are bootstrapping the mean you can do as follows:

set.seed(1)
library(boot)
x<-rnorm(100)
meanFunc <- function(x,i){mean(x[i])}
bootMean <- boot(x,meanFunc,100)
>bootMean

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = x, statistic = meanFunc, R = 100)


Bootstrap Statistics :
     original      bias    std. error
t1* 0.1088874 0.002614105  0.07902184

If you just input the mean as an argument you will get the error like the one you got:

bootMean <- boot(x,mean,100)
Error in mean.default(data, original, ...) : 
  'trim' must be numeric of length one
like image 196
Metrics Avatar answered Sep 29 '22 08:09

Metrics


I never really used boot, since I do not understand what it will bring to the table.

Given that the standard error is defined as:

sd(sampled.df) / sqrt(length(df))

I believe you can simply use the following function to get this done:

custom.boot <- function(times, data=df) {
  boots <- rep(NA, times)
  for (i in 1:times) {
    boots[i] <- sd(sample(data, length(data), replace=TRUE))/sqrt(length(data))  
  }
  boots
}

You can then calculate the expected value for yourself (since you get a distribution of some sample realization):

# Mean standard error
mean(custom.boot(times=1000))
[1] 0.08998023

Some years later...

I think this is nicer:

mean(replicate(times, sd(sample(df, replace=T))/sqrt(length(df))))
like image 20
PascalVKooten Avatar answered Sep 29 '22 08:09

PascalVKooten