Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Anyone can tell me why we always use the gaussian distribution in Machine learning?

For example, we always assumed that the data or signal error is a Gaussian distribution? why?

like image 299
laotao Avatar asked Sep 27 '12 07:09

laotao


People also ask

Why is Gaussian distribution so common?

The Normal Distribution (or a Gaussian) shows up widely in statistics as a result of the Central Limit Theorem. Specifically, the Central Limit Theorem says that (in most common scenarios besides the stock market) anytime “a bunch of things are added up,” a normal distribution is going to result.

Why do we always use normal distribution?

As with any probability distribution, the normal distribution describes how the values of a variable are distributed. It is the most important probability distribution in statistics because it accurately describes the distribution of values for many natural phenomena.

Where is Gaussian distribution used?

normal distribution, also called Gaussian distribution, the most common distribution function for independent, randomly generated variables. Its familiar bell-shaped curve is ubiquitous in statistical reports, from survey analysis and quality control to resource allocation.

What is the role of Gaussian distribution in data science?

Normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. In graphical form, the normal distribution appears as a "bell curve".


4 Answers

The answer you'll get from mathematically minded people is "because of the central limit theorem". This expresses the idea that when you take a bunch of random numbers from almost any distribution* and add them together, you will get something approximately normally distributed. The more numbers you add together, the more normally distributed it gets.

I can demonstrate this in Matlab/Octave. If I generate 1000 random numbers between 1 and 10 and plot a histogram, I get something like this

enter image description here

If instead of generating a single random number, I generate 12 of them and add them together, and do this 1000 times and plot a histogram, I get something like this:

enter image description here

I've plotted a normal distribution with the same mean and variance over the top, so you can get an idea of how close the match is. You can see the code I used to generate these plots at this gist.

In a typical machine learning problem you will have errors from many different sources (e.g. measurement error, data entry error, classification error, data corruption...) and it's not completely unreasonable to think that the combined effect of all of these errors is approximately normal (although of course, you should always check!)

More pragmatic answers to the question include:

  • Because it makes the math simpler. The probability density function for the normal distribution is an exponential of a quadratic. Taking the logarithm (as you often do, because you want to maximize the log likelihood) gives you a quadratic. Differentiating this (to find the maximum) gives you a set of linear equations, which are easy to solve analytically.

  • It's simple - the entire distribution is described by two numbers, the mean and variance.

  • It's familiar to most people who will be reading your code/paper/report.

It's generally a good starting point. If you find that your distributional assumptions are giving you poor performance, then maybe you can try a different distribution. But you should probably look at other ways to improve the model's performance first.

*Technical point - it needs to have finite variance.

like image 195
Chris Taylor Avatar answered Oct 12 '22 13:10

Chris Taylor


Gaussian distributions are the most "natural" distributions. They show up everywhere. Here is a list of the properties that make me think that Gaussians are the most natural distributions:

  • The sum of several random variables (like dice) tends to be Gaussian as noted by nikie. (Central Limit Theorem).
  • There are two natural ideas that appear in machine learning, the standard deviation and the maximum entropy principle. If you ask the question, "Among all distributions with standard deviation 1 and mean 0, what is the distribution with maximum entropy?" The answer is the Gaussian.
  • Randomly select a point inside a high dimensional hypersphere. The distribution of any particular coordinate is approximately Gaussian. The same is true for a random point on the surface of the hypersphere.
  • Take several samples from a Gaussian Distribution. Compute the Discrete Fourier Transform of the samples. The results have a Gaussian Distribution. I am pretty sure that the Gaussian is the only distribution with this property.
  • The eigenfunctions of the Fourier Transforms are products of polynomials and Gaussians.
  • The solution to the differential equations y' = -x y is a Gaussian. This fact makes computations with Gaussians easier. (Higher derivatives involve Hermite polynomials.)
  • I think Gaussians are the only distributions closed under both multiplication, convolution, and linear transformations.
  • Maximum likelihood estimators to problems involving Gaussians tend to also be the least squares solutions.
  • I think all solutions to stochastic differential equations involve Gaussians. (This is mainly a consequence of the Central Limit Theorem.
  • "The normal distribution is the only absolutely continuous distribution all of whose cumulants beyond the first two (i.e. other than the mean and variance) are zero." - Wikipedia.
  • For even n, the nth moment of the Guassian is simply an integer multiplied by the standard deviation to the nth power.
  • Many of the other standard distributions are strongly related to the Gaussian (i.e. binomial, Poisson, chi-squared, Student t, Rayleigh, Logistic, Log-Normal, Hypergeometric ...)
  • "If X1 and X2 are independent and their sum X1 + X2 is distributed normally, then both X1 and X2 must also be normal" -- From the Wikipedia.
  • "The conjugate prior of the mean of a normal distribution is another normal distribution." -- From the Wikipedia.
  • When using Gaussians, the math is easier.
  • The Erdős–Kac theorem implies that the distribution of the prime factors of a "random" integer is Gaussian.
  • The velocities of random molecules in a gas are distributed as a Gaussian. (With standard deviation = z*sqrt( k T / m) where z is a constant and k is Boltzman's constant.)
  • "A Gaussian function is the wave function of the ground state of the quantum harmonic oscillator." -- From Wikipedia
  • Kalman Filters.
  • The Gauss–Markov theorem.

This post is cross posted at http://artent.net/blog/2012/09/27/why-are-gaussian-distributions-great/

like image 40
Hans Scundal Avatar answered Oct 12 '22 13:10

Hans Scundal


The signal error if often a sum of many independent errors. For example, in CCD camera you could have photon noise, transmission noise, digitization noise (and maybe more) that are mostly independent, so the error will often be normally distributed due to the central limit theorem.

Also, modeling the error as a normal distribution often makes calculations very simple.

like image 40
Niki Avatar answered Oct 12 '22 14:10

Niki


  1. The math often would not come out. :)

  2. The normal distribution is very common. See nikie's answer.

  3. Even non-normal distributions can often be looked as normal distribution with a large deviation. Yes, it's a dirty hack.

The first point might look funny but I did some research for problems where we had non-normal distributions and the maths get horribly complicated. In practice, often computer simluations are carried out to "prove the theorems".

like image 36
Ali Avatar answered Oct 12 '22 15:10

Ali