Creating a mixture of probability distributions for sampling

Tags:

Is there a general way to join SciPy (or NumPy) probability distributions to create a mixture probability distribution which can then be sampled from?

I have such a distribution for display using something like:

mixture_gaussian = (norm.pdf(x_axis, -3, 1) + norm.pdf(x_axis, 3, 1)) / 2

which if then plotted looks like:

double gaussian

However, I can't sample from this generated model, as it's just a list of points which will plot as the curve.

Note, this specific distribution is just a simple example. I'd like to be able to generate several kinds of distributions (including "sub"-distributions which are not just normal distributions). Ideally, I would hope there would be someway for the function to be automatically normalized (i.e. not having to do the / 2 explicitly as in the code above.

Does SciPy/NumPy provide some way of easily accomplishing this?

This answer provides a way that such a sampling from a multiple distributions could be done, but it certainly requires a bit of handcrafting for a given mixture distribution, especially when wanting to weight different "sub"-distributions differently. This is usable, but I would hope for method that's a bit cleaner and straight forward if possible. Thanks!

640

asked Dec 11 '17 18:12

Jenny Shoars

1 Answers

Following @PaulPanzer's pointer in the comments, I created the following subclass for easily creating mixture models from the SciPy distributions. Note, the pdf is not required for my question, but it was nice for me to have.

class MixtureModel(rv_continuous):
    def __init__(self, submodels, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.submodels = submodels

    def _pdf(self, x):
        pdf = self.submodels[0].pdf(x)
        for submodel in self.submodels[1:]:
            pdf += submodel.pdf(x)
        pdf /= len(self.submodels)
        return pdf

    def rvs(self, size):
        submodel_choices = np.random.randint(len(self.submodels), size=size)
        submodel_samples = [submodel.rvs(size=size) for submodel in self.submodels]
        rvs = np.choose(submodel_choices, submodel_samples)
        return rvs

mixture_gaussian_model = MixtureModel([norm(-3, 1), norm(3, 1)])
x_axis = np.arange(-6, 6, 0.001)
mixture_pdf = mixture_gaussian_model.pdf(x_axis)
mixture_rvs = mixture_gaussian_model.rvs(10)

188

answered Nov 15 '22 18:11

Jenny Shoars

Related questions
                            
                                asyncio server and client to handle input from console
                            
                                Easy way to add thousand separator to numbers in Python pandas DataFrame
                            
                                Python Error 104, connection reset by peer
                            
                                How do I calculate PDF (probability density function) in Python?
                            
                                Deleting User Messages in Discord.py
                            
                                python: extracting variables from string templates
                            
                                Seaborn Boxplot: get the xtick labels
                            
                                Using networkx to calculate eigenvector centrality
                            
                                Apply textblob in for each row of a dataframe
                            
                                Destroying a Singleton object in Python
                            
                                understanding matplotlib.subplots python [duplicate]
                            
                                Pandas DataFrame mutability
                            
                                How to do zero padding in keras conv layer?
                            
                                python installing package with submodules
                            
                                OSMNx : get coordinates of nodes using OSM id
                            
                                Finding equal values from a list of list of tuples in Python
                            
                                Matplotlib savefig() over multiple graphs keeps saving the same graph
                            
                                prefetch_related for Authenticated user
                            
                                Django: Read uploaded CSV file using FileField instance
                            
                                difference between str(dict) and json.dumps(dict)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Creating a mixture of probability distributions for sampling

Tags:

python

numpy

scipy

probability-density

Jenny Shoars

People also ask

1 Answers

Jenny Shoars

Recent Activity

Donate For Us