Python sklearn.mixture.GMM not robust to scale?

Question

I'm using sklearn.mixture.GMM in Python, and the results seem to depend on data scaling. In the following code example, I change the overall scaling but I do NOT change the relative scaling of the dimensions. Yet under the three different scaling settings I get completely different results:

from sklearn.mixture import GMM
from numpy import array, shape
from numpy.random import randn
from random import choice

# centroids will be normally-distributed around zero:
truelumps = randn(20, 5) * 10

# data randomly sampled from the centroids:
data = array([choice(truelumps) + randn(5) for _ in xrange(1000)])

for scaler in [0.01, 1, 100]:
    scdata = data * scaler
    thegmm = GMM(n_components=10)
    thegmm.fit(scdata, n_iter=1000)
    ll = thegmm.score(scdata)
    print sum(ll)

Here's the output I get:

GMM(cvtype='diag', n_components=10)
7094.87886779
GMM(cvtype='diag', n_components=10)
-14681.566456
GMM(cvtype='diag', n_components=10)
-37576.4496656

In principle, I don't think the overall data scaling should matter, and the total log-likelihoods should come out similar each time. But maybe there's an implementation issue I'm overlooking?

Dan Stowell · Accepted Answer

I've had an answer via the scikit-learn mailing list: in my code example, the log-likelihood should indeed vary with scale (because we're evaluating point likelihoods, not integrals), by a factor relating to log(scale). So I think my code example in fact shows GMM giving correct results.

Python sklearn.mixture.GMM not robust to scale?

Tags:

python

machine-learning

scikit-learn

gaussian

Dan Stowell

1 Answers

Dan Stowell

Recent Activity

Donate For Us

Python sklearn.mixture.GMM not robust to scale?

Tags:

python

machine-learning

scikit-learn

gaussian

Dan Stowell

1 Answers

Dan Stowell

Related questions

Recent Activity

Donate For Us