I am interested in fitting a 2-component Gaussian Mixture Model to the data shown below. However, since what I am plotting here are log-transformed counts normalized to be between 0-1, the maximum value my data will ever take is 0. When I try a naive fit using sklearn.mixture.GaussianMixture (code below), I get the resulting fit, which is obviously not what I want.
from sklearn.mixture import GaussianMixture
import numpy as np
# start with some count data in (0,1]
logged_counts = np.log(counts)
model = GaussianMixture(2).fit(logged_counts.reshape(-1,1))
# plot resulting fit
x_range = np.linspace(np.min(logged_counts), 0, 1000)
pdf = np.exp(model.score_samples(x_range.reshape(-1, 1)))
responsibilities = model.predict_proba(x_range.reshape(-1, 1))
pdf_individual = responsibilities * pdf[:, np.newaxis]
plt.hist(logged_counts, bins='auto', density=True, histtype='stepfilled', alpha=0.5)
plt.plot(x_range, pdf, '-k', label='Mixture')
plt.plot(x_range, pdf_individual, '--k', label='Components')
plt.legend()
plt.show()
I would love it if I could fix the mean of the top component at 0 and only optimize the other mean, the two variances, and the mixing fractions. (Additionally I would love to be able to use a half-normal for the component on the right.) Is there a simple way to do this with built-in functions in python/sklearn, or will I have to build that model myself using some probabilistic programming language?
Afaik, you cannot do exactly what you want in sklearn.
Imho, basically there are multiple strategies: (i) implement GMM yourself, (ii) switch to another language/framework, (iii) adapt GMM code, or (iv) adapt.
(i) You probably do not want to do this unless you want to learn for yourself.
(ii) You could use stan and adapt the code in the last paragraph to have a fixed component of your choice (distribution type and parameters)
(iii) You could do (i) but slightly adapt the sklearn code or simply use the methods for estimation but with you own slight modifications.
(iv)
.fit(data)
). Then subtract the resulting parameters from your original data (like in option 2). And then fit a GMM. to find out the next component.Hope this helps :-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With