For example, we always assumed that the data or signal error is a Gaussian distribution? why?
The Normal Distribution (or a Gaussian) shows up widely in statistics as a result of the Central Limit Theorem. Specifically, the Central Limit Theorem says that (in most common scenarios besides the stock market) anytime “a bunch of things are added up,” a normal distribution is going to result.
As with any probability distribution, the normal distribution describes how the values of a variable are distributed. It is the most important probability distribution in statistics because it accurately describes the distribution of values for many natural phenomena.
normal distribution, also called Gaussian distribution, the most common distribution function for independent, randomly generated variables. Its familiar bell-shaped curve is ubiquitous in statistical reports, from survey analysis and quality control to resource allocation.
Normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. In graphical form, the normal distribution appears as a "bell curve".
The answer you'll get from mathematically minded people is "because of the central limit theorem". This expresses the idea that when you take a bunch of random numbers from almost any distribution* and add them together, you will get something approximately normally distributed. The more numbers you add together, the more normally distributed it gets.
I can demonstrate this in Matlab/Octave. If I generate 1000 random numbers between 1 and 10 and plot a histogram, I get something like this
If instead of generating a single random number, I generate 12 of them and add them together, and do this 1000 times and plot a histogram, I get something like this:
I've plotted a normal distribution with the same mean and variance over the top, so you can get an idea of how close the match is. You can see the code I used to generate these plots at this gist.
In a typical machine learning problem you will have errors from many different sources (e.g. measurement error, data entry error, classification error, data corruption...) and it's not completely unreasonable to think that the combined effect of all of these errors is approximately normal (although of course, you should always check!)
More pragmatic answers to the question include:
Because it makes the math simpler. The probability density function for the normal distribution is an exponential of a quadratic. Taking the logarithm (as you often do, because you want to maximize the log likelihood) gives you a quadratic. Differentiating this (to find the maximum) gives you a set of linear equations, which are easy to solve analytically.
It's simple - the entire distribution is described by two numbers, the mean and variance.
It's familiar to most people who will be reading your code/paper/report.
It's generally a good starting point. If you find that your distributional assumptions are giving you poor performance, then maybe you can try a different distribution. But you should probably look at other ways to improve the model's performance first.
*Technical point - it needs to have finite variance.
Gaussian distributions are the most "natural" distributions. They show up everywhere. Here is a list of the properties that make me think that Gaussians are the most natural distributions:
This post is cross posted at http://artent.net/blog/2012/09/27/why-are-gaussian-distributions-great/
The signal error if often a sum of many independent errors. For example, in CCD camera you could have photon noise, transmission noise, digitization noise (and maybe more) that are mostly independent, so the error will often be normally distributed due to the central limit theorem.
Also, modeling the error as a normal distribution often makes calculations very simple.
The math often would not come out. :)
The normal distribution is very common. See nikie's answer.
Even non-normal distributions can often be looked as normal distribution with a large deviation. Yes, it's a dirty hack.
The first point might look funny but I did some research for problems where we had non-normal distributions and the maths get horribly complicated. In practice, often computer simluations are carried out to "prove the theorems".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With