Having a dataset and calculating statistics from it is easy. How about the other way around?
Let's say I know some variable has an average X, standard deviation Y and assume it has normal (Gaussian) distribution. What would be the best way to generate a "random" dataset (of arbitrary size) which will fit the distribution?
EDIT: This kind of develops from this question; I could make something based on that method, but I am wondering if there's a more efficient way to do it.
You need to standarize it - substract mean and then divide by std deviation. Then You are free to transform this sample to Normal distribution with given parameters: multiply by std deviation and then add mean.
Nevertheless, comparing means and standard deviations do not guarantee that the distributions are similar -- you may have two distributions with the same mean and standard deviation that, e.g., have different skewness and/or kurtosis.
All normal distributions, like the standard normal distribution, are unimodal and symmetrically distributed with a bell-shaped curve. However, a normal distribution can take on any value as its mean and standard deviation. In the standard normal distribution, the mean and standard deviation are always fixed.
The standard deviation is calculated as the square root of variance by determining each data point's deviation relative to the mean. If the data points are further from the mean, there is a higher deviation within the data set; thus, the more spread out the data, the higher the standard deviation.
You can generate standard normal random variables with the Box-Mueller method. Then to transform that to have mean mu and standard deviation sigma, multiply your samples by sigma and add mu. I.e. for each z from the standard normal, return mu + sigma*z.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With