I'm trying to calculate the confidence interval for the mean value using the method of bootstrap in python. Let say I have a vector a with 100 entries and my aim is to calculate the mean value of these 100 values and its 95% confidence interval using bootstrap. So far I have manage to resample 1000 times from my vector using the np.random.choice function. Then for each bootstrap vector with 100 entries I calculated the mean. So now I have 1000 bootstrap mean values and a single sample mean value from my initial vector but I'm not sure how to proceed from here. How could I use these mean values to find the confidence interval for the mean value of my initial vector? I'm relatively new in python and it's the first time I came across with the method of bootstrap so any help would be much appreciated.
The percentile bootstrap interval is just the interval between the 100×(α2) and 100×(1-α2) percentiles of the distribution of θ estimates obtained from resampling, where θ represents a parameter of interest and α is the level of significance (e.g., α = 0.05 for 95% CIs) (Efron, 1982).
Calculating a C% confidence interval with the Normal approximation. ˉx±zs√n, where the value of z is appropriate for the confidence level. For a 95% confidence interval, we use z=1.96, while for a 90% confidence interval, for example, we use z=1.64.
Let's say you calculated 95% confidence interval from bootstrapped resamples. Now the interpretation is: "95% of the times, this bootstrap method accurately results in a confidence interval containing the true population parameter".
You could sort the array of 1000 means and use the 50th and 950th elements as the 90% bootstrap confidence interval.
Your set of 1000 means is basically a sample of the distribution of the mean estimator (the sampling distribution of the mean). So, any operation you could do on a sample from a distribution you can do here.
I have a simple statistical solution : Confidence intervals are based on the standard error. The standard error in your case is the standard deviation of your 1000 bootstrap means. Assuming a normal distribution of the sampling distribution of your parameter(mean), which should be warranted by the properties of the Central Limit Theorem, just multiply the equivalent z-score of the desired confidence interval with the standard deviation. Therefore:
lower boundary = mean of your bootstrap means - 1.96 * std. dev. of your bootstrap means
upper boundary = mean of your bootstrap means + 1.96 * std. dev. of your bootstrap means
95% of cases in a normal distribution sit within 1.96 standard deviations from the mean
hope this helps
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With