Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scikit-learn GMM produce positive log probability

I am using Gaussian Mixture Model from python scikit-learn package to train my dataset , however , I fount that when I code

-- G=mixture.GMM(...)

-- G.fit(...)

-- G.score(sum feature)

the resulting log probability is positive real number... why is that? isn't log probability guaranteed to be negative?

I get it. what Gaussian Mixture Model returns to us i the log probability "density" instead of probability "mass" so positive value is totally reasonable.

If the covariance matrix is near to singular, then the GMM will not perfomr well, and generally it means the data is not good for such generative task

like image 255
Jing Avatar asked Aug 29 '12 10:08

Jing


1 Answers

Positive log probabilities are okay.

Remember that the GMM computed probability is a probability density function (PDF), so can be greater than one at any individual point.

The restriction is that the PDF must integrate to one over the data domain.

If the log probability grows very large, then the inference algorithm may have reached a degenerate solution (common with maximum likelihood estimation if you have a small dataset).

To check that the GMM algorithm has not reached a degenerate solution, you should look at the variances for each component. If any of the variances is close to zero, then this is bad. As an alternative, you should use a Bayesian model rather than maximum likelihood estimation (if you aren't doing so already).

like image 150
user1149913 Avatar answered Sep 21 '22 08:09

user1149913