For example, we always assumed that the data or signal error is a Gaussian distribution? why?

Gaussian distributions are the most "natural" distributions. They show up everywhere. Here is a list of the properties that make me think that Gaussians are the most natural distributions: <ul> <li>The sum of several random variables (like dice) tends to be Gaussian as noted by nikie. (Central Limit Theorem).</li> <li>There are two natural ideas that appear in machine learning, the standard deviation and the maximum entropy principle. If you ask the question, "Among all distributions with standard deviation 1 and mean 0, what is the distribution with maximum entropy?" The answer is the Gaussian.</li> <li>Randomly select a point inside a high dimensional hypersphere. The distribution of any particular coordinate is approximately Gaussian. The same is true for a random point on the surface of the hypersphere.</li> <li>Take several samples from a Gaussian Distribution. Compute the Discrete Fourier Transform of the samples. The results have a Gaussian Distribution. I am pretty sure that the Gaussian is the only distribution with this property.</li> <li>The eigenfunctions of the Fourier Transforms are products of polynomials and Gaussians.</li> <li>The solution to the differential equations y' = -x y is a Gaussian. This fact makes computations with Gaussians easier. (Higher derivatives involve Hermite polynomials.)</li> <li>I think Gaussians are the only distributions closed under both multiplication, convolution, and linear transformations.</li> <li>Maximum likelihood estimators to problems involving Gaussians tend to also be the least squares solutions.</li> <li>I think all solutions to stochastic differential equations involve Gaussians. (This is mainly a consequence of the Central Limit Theorem.</li> <li>"The normal distribution is the only absolutely continuous distribution all of whose cumulants beyond the first two (i.e. other than the mean and variance) are zero." - Wikipedia.</li> <li>For even n, the nth moment of the Guassian is simply an integer multiplied by the standard deviation to the nth power. </li> <li>Many of the other standard distributions are strongly related to the Gaussian (i.e. binomial, Poisson, chi-squared, Student t, Rayleigh, Logistic, Log-Normal, Hypergeometric ...)</li> <li>"If X1 and X2 are independent and their sum X1 + X2 is distributed normally, then both X1 and X2 must also be normal" -- From the Wikipedia.</li> <li>"The conjugate prior of the mean of a normal distribution is another normal distribution." -- From the Wikipedia.</li> <li>When using Gaussians, the math is easier.</li> <li>The Erdős–Kac theorem implies that the distribution of the prime factors of a "random" integer is Gaussian.</li> <li>The velocities of random molecules in a gas are distributed as a Gaussian. (With standard deviation = z*sqrt( k T / m) where z is a constant and k is Boltzman's constant.)</li> <li>"A Gaussian function is the wave function of the ground state of the quantum harmonic oscillator." -- From Wikipedia</li> <li>Kalman Filters.</li> <li>The Gauss–Markov theorem. </li> </ul> This post is cross posted at http://artent.net/blog/2012/09/27/why-are-gaussian-distributions-great/

<ol> <li>The math often would not come out. :)</li> <li>The normal distribution is very common. See nikie's answer.</li> <li>Even non-normal distributions can often be looked as normal distribution with a large deviation. Yes, it's a dirty hack.</li> </ol> The first point might look funny but I did some research for problems where we had non-normal distributions and the maths get horribly complicated. In practice, often computer simluations are carried out to "prove the theorems".

Anyone can tell me why we always use the gaussian distribution in Machine learning?

4 Answers

The answer you'll get from mathematically minded people is "because of the central limit theorem". This expresses the idea that when you take a bunch of random numbers from almost any distribution* and add them together, you will get something approximately normally distributed. The more numbers you add together, the more normally distributed it gets.

I can demonstrate this in Matlab/Octave. If I generate 1000 random numbers between 1 and 10 and plot a histogram, I get something like this

enter image description here

If instead of generating a single random number, I generate 12 of them and add them together, and do this 1000 times and plot a histogram, I get something like this:

enter image description here

I've plotted a normal distribution with the same mean and variance over the top, so you can get an idea of how close the match is. You can see the code I used to generate these plots at this gist.

In a typical machine learning problem you will have errors from many different sources (e.g. measurement error, data entry error, classification error, data corruption...) and it's not completely unreasonable to think that the combined effect of all of these errors is approximately normal (although of course, you should always check!)

More pragmatic answers to the question include:

Because it makes the math simpler. The probability density function for the normal distribution is an exponential of a quadratic. Taking the logarithm (as you often do, because you want to maximize the log likelihood) gives you a quadratic. Differentiating this (to find the maximum) gives you a set of linear equations, which are easy to solve analytically.
It's simple - the entire distribution is described by two numbers, the mean and variance.
It's familiar to most people who will be reading your code/paper/report.

It's generally a good starting point. If you find that your distributional assumptions are giving you poor performance, then maybe you can try a different distribution. But you should probably look at other ways to improve the model's performance first.

*Technical point - it needs to have finite variance.

195

answered Oct 12 '22 13:10

Chris Taylor

Gaussian distributions are the most "natural" distributions. They show up everywhere. Here is a list of the properties that make me think that Gaussians are the most natural distributions:

The sum of several random variables (like dice) tends to be Gaussian as noted by nikie. (Central Limit Theorem).
There are two natural ideas that appear in machine learning, the standard deviation and the maximum entropy principle. If you ask the question, "Among all distributions with standard deviation 1 and mean 0, what is the distribution with maximum entropy?" The answer is the Gaussian.
Randomly select a point inside a high dimensional hypersphere. The distribution of any particular coordinate is approximately Gaussian. The same is true for a random point on the surface of the hypersphere.
Take several samples from a Gaussian Distribution. Compute the Discrete Fourier Transform of the samples. The results have a Gaussian Distribution. I am pretty sure that the Gaussian is the only distribution with this property.
The eigenfunctions of the Fourier Transforms are products of polynomials and Gaussians.
The solution to the differential equations y' = -x y is a Gaussian. This fact makes computations with Gaussians easier. (Higher derivatives involve Hermite polynomials.)
I think Gaussians are the only distributions closed under both multiplication, convolution, and linear transformations.
Maximum likelihood estimators to problems involving Gaussians tend to also be the least squares solutions.
I think all solutions to stochastic differential equations involve Gaussians. (This is mainly a consequence of the Central Limit Theorem.
"The normal distribution is the only absolutely continuous distribution all of whose cumulants beyond the first two (i.e. other than the mean and variance) are zero." - Wikipedia.
For even n, the nth moment of the Guassian is simply an integer multiplied by the standard deviation to the nth power.
Many of the other standard distributions are strongly related to the Gaussian (i.e. binomial, Poisson, chi-squared, Student t, Rayleigh, Logistic, Log-Normal, Hypergeometric ...)
"If X1 and X2 are independent and their sum X1 + X2 is distributed normally, then both X1 and X2 must also be normal" -- From the Wikipedia.
"The conjugate prior of the mean of a normal distribution is another normal distribution." -- From the Wikipedia.
When using Gaussians, the math is easier.
The Erdős–Kac theorem implies that the distribution of the prime factors of a "random" integer is Gaussian.
The velocities of random molecules in a gas are distributed as a Gaussian. (With standard deviation = z*sqrt( k T / m) where z is a constant and k is Boltzman's constant.)
"A Gaussian function is the wave function of the ground state of the quantum harmonic oscillator." -- From Wikipedia
Kalman Filters.
The Gauss–Markov theorem.

This post is cross posted at http://artent.net/blog/2012/09/27/why-are-gaussian-distributions-great/

answered Oct 12 '22 13:10

Hans Scundal

The signal error if often a sum of many independent errors. For example, in CCD camera you could have photon noise, transmission noise, digitization noise (and maybe more) that are mostly independent, so the error will often be normally distributed due to the central limit theorem.

Also, modeling the error as a normal distribution often makes calculations very simple.

answered Oct 12 '22 14:10

Niki

The math often would not come out. :)
The normal distribution is very common. See nikie's answer.
Even non-normal distributions can often be looked as normal distribution with a large deviation. Yes, it's a dirty hack.

The first point might look funny but I did some research for problems where we had non-normal distributions and the maths get horribly complicated. In practice, often computer simluations are carried out to "prove the theorems".

answered Oct 12 '22 15:10

Ali

Related questions
                            
                                Fast sqrt in Java at the expense of accuracy
                            
                                Faster math algorithm sacrificing accuracy
                            
                                Can anyone recommend some Transformation Matrix tutorials for dummies? [closed]
                            
                                Max product of the three numbers for a given array of size N
                            
                                Have decimal amount, want to trim to 2 decimal places if present
                            
                                How to get coordinates of a point in a coordinate system based on angle and distance
                            
                                Rounding up c# giving wrong answer
                            
                                How is '2'+'2'-'2'= 20 in JavaScript?
                            
                                Why can't my program access the math methods in Java?
                            
                                Boost's Linear Algebra Solution for y=Ax
                            
                                How to calculate the mirror point along a line?
                            
                                Calculating e^x without using any functions
                            
                                How to improve my math skills to become a better programmer [closed]
                            
                                Calculating angle between two vectors in python
                            
                                How do I normalize an image?
                            
                                Integer math in c#
                            
                                Modulus power of big numbers
                            
                                Algorithm for Calculating Binomial Coefficient
                            
                                How do you find if a number is within a range in Java? Problems with Math.abs(num1-num2) <= inRange
                            
                                Adding Percentage of Number to Itself

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Anyone can tell me why we always use the gaussian distribution in Machine learning?

Tags:

math

machine-learning

gaussian

bayesian

laotao

People also ask

4 Answers

Chris Taylor

Hans Scundal

Niki

Ali

Recent Activity

Donate For Us