recently I came across this term,but really have no idea what it refers to.I've searched online,but with little gain. Thanks.
Bootstrapping is a statistical procedure that resamples a single dataset to create many simulated samples. This process allows you to calculate standard errors, construct confidence intervals, and perform hypothesis testing for numerous types of sample statistics.
In statistics, Bootstrap Sampling is a method that involves drawing of sample data repeatedly with replacement from a data source to estimate a population parameter.
Bootstrapping is any test or metric that uses random sampling with replacement (e.g. mimicking the sampling process), and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy (bias, variance, confidence intervals, prediction error, etc.) to sample estimates.
So the answer is since "bootstrapping allows you to perform estimates from a single population", so the term like "standing on own feet" or "pull oneself up by own bootstraps" being used to indicate that.
Take a sample of the time of day that you wake up on Saturdays. Some Friday nights you have a few too many drinks, so you wake up early (but go back to bed). Other days you wake up at a normal time. Other days you sleep in.
Here are the results:
[3.1, 4.8, 6.3, 6.4, 6.6, 7.3, 7.5, 7.7, 7.9, 10.1]
What is the mean time that you wake up?
Well it's 6.8 (o'clock, or 6:48). A touch early for me.
How good a prediction is this of when you'll wake up next Saturday? Can you quantify how wrong you are likely to be?
It's a pretty small sample, and we're not sure of the distribution of the underlying process, so it might not be a good idea to use standard parametric statistical techniques†.
Why don't we take a random sample of our sample, and calculate the mean and repeat this? This will give us an estimate of how bad our estimate is.
I did this several times, and the mean was between 5.98 and 7.8
This is called the bootstrap, and it was first mentioned by Bradley Efron in 1979.
A variant is called the jackknife, where you sample all but one of your dataset, take the mean, and repeat. The jackknife mean is 6.8 (same as the arithmetic mean) and ranges from 6.4 to 7.2.
Another variant is called k-fold cross-validation, where you (at random) split your data set into k equally-sized sections, calculate the mean of all but one section, and repeat k times. The 5-fold cross-validation mean is 6.8 and ranges from 4 to 9.
† This distribution does happen to be Normal. The 95% confidence interval of the mean is 5.43 to 8.11, reasonably close but bigger than the bootstrap mean.
If you don't have enough data to train your algorithm you can increase the size of your training set by (uniformly) randomly selecting items and duplicating them (with replacement).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With