Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a mixture of probability distributions for sampling

Is there a general way to join SciPy (or NumPy) probability distributions to create a mixture probability distribution which can then be sampled from?

I have such a distribution for display using something like:

mixture_gaussian = (norm.pdf(x_axis, -3, 1) + norm.pdf(x_axis, 3, 1)) / 2

which if then plotted looks like:

double gaussian

However, I can't sample from this generated model, as it's just a list of points which will plot as the curve.

Note, this specific distribution is just a simple example. I'd like to be able to generate several kinds of distributions (including "sub"-distributions which are not just normal distributions). Ideally, I would hope there would be someway for the function to be automatically normalized (i.e. not having to do the / 2 explicitly as in the code above.

Does SciPy/NumPy provide some way of easily accomplishing this?

This answer provides a way that such a sampling from a multiple distributions could be done, but it certainly requires a bit of handcrafting for a given mixture distribution, especially when wanting to weight different "sub"-distributions differently. This is usable, but I would hope for method that's a bit cleaner and straight forward if possible. Thanks!

like image 640
Jenny Shoars Avatar asked Dec 11 '17 18:12

Jenny Shoars


People also ask

How do you combine multiple probability distributions?

One common method of consolidating two probability distributions is to simply average them - for every set of values A, set If the distributions both have densities, for example, averaging the probabilities results in a probability distribution with density the average of the two input densities (Figure 1).


1 Answers

Following @PaulPanzer's pointer in the comments, I created the following subclass for easily creating mixture models from the SciPy distributions. Note, the pdf is not required for my question, but it was nice for me to have.

class MixtureModel(rv_continuous):
    def __init__(self, submodels, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.submodels = submodels

    def _pdf(self, x):
        pdf = self.submodels[0].pdf(x)
        for submodel in self.submodels[1:]:
            pdf += submodel.pdf(x)
        pdf /= len(self.submodels)
        return pdf

    def rvs(self, size):
        submodel_choices = np.random.randint(len(self.submodels), size=size)
        submodel_samples = [submodel.rvs(size=size) for submodel in self.submodels]
        rvs = np.choose(submodel_choices, submodel_samples)
        return rvs

mixture_gaussian_model = MixtureModel([norm(-3, 1), norm(3, 1)])
x_axis = np.arange(-6, 6, 0.001)
mixture_pdf = mixture_gaussian_model.pdf(x_axis)
mixture_rvs = mixture_gaussian_model.rvs(10)
like image 188
Jenny Shoars Avatar answered Nov 15 '22 18:11

Jenny Shoars