Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I sample from a complex or compound distribution in Haskell?

I'm trying to generate random masses for hypothetical planets in Haskell. I want to produce these masses by sampling a bi-modal distribution (ideally the superposition of two normal distributions: one corresponding to small planets and one corresponding to gas giants). I've looked at the statistics package, which provides the quantile function, which can turn a uniformly distributed Double into a Double on a number of distributions. But there doesn't seem to be any support for composing distributions.

This particular case could be hacked around by picking one distribution or the other to sample before-hand, but I'd like to do it with a single distribution, especially since I might need to tweak the overall distribution later. Eventually I might replace the normal distribution with real data from sky surveys.

I'm considering implementing rejection sampling myself, which can handle arbitrary distributions fairly simply, but it seems rather inefficient, and it certainly wouldn't be a good idea to implement it if a solution exists already as a library.

Is there a Haskell library that supports sampling from composed or explicitly specified distributions? Or an existing Haskell implementation of rejection sampling? Alternatively, is there an explicit formula for the inverse of the CDF of the sum of two normal distributions?

like image 693
interfect Avatar asked May 31 '12 03:05

interfect


1 Answers

In the case of a simple mixture of distributions, you can get an efficient sampler via the 'hack' you first mentioned:

This particular case could be hacked around by picking one distribution or the other to sample before-hand, but I'd like to do it with a single distribution, especially since I might need to tweak the overall distribution later.

This is actually a case of Gibbs sampling, which is very prevalent in statistics. It's very flexible, and if you know the number of mixtures you're using, it will probably be hard to beat. Choose one individual distribution from the entire ensemble to sample from, and then sample from that conditional distribution. Rinse and repeat.

Here's a simple, unoptimized Haskell implementation for a mixture-of-Gaussians Gibbs sampler. It's pretty basic, but you get the idea:

import System.Random
import Control.Monad.State

type ModeList = [(Double, Double)]                 -- A list of mean/stdev pairs, for each mode.

-- Generate a Gaussian (0, 1) variate.
boxMuller :: StdGen -> (Double, StdGen)
boxMuller gen = (sqrt (-2 * log u1) * cos (2 * pi * u2), gen'')
    where (u1, gen')  = randomR (0, 1) gen 
          (u2, gen'') = randomR (0, 1) gen'

sampler :: ModeList -> State StdGen Double
sampler modeInfo = do
    gen <- get
    let n           = length modeInfo
        (z0, g0)    = boxMuller gen
        (c,  g1)    = randomR (0, n - 1) g0        -- Sample from the components.
        (cmu, csig) = modeInfo !! c                
    put g1
    return $ cmu + csig * z0                       -- Sample from the conditional distribution.

Here's a example run: sampling 100 times from a one-dimensional mixture of two Gaussians. The modes are at x = -3 and x = 2.5, and each mixture component has its own separate variance. You could add as many modes as you want here.

main = do
let gen      = mkStdGen 42
    modeInfo = [(2.5, 1.0), (-3, 1.5)]
    samples     = (`evalState` gen) . replicateM 100 $ sampler modeInfo
print samples

Here's a smoothed density plot of those 100 samples (using R and ggplot2):

a mixture of gaussians

A more general purpose algorithm would be a rejection or importance sampler, and in the case of more complicated distributions you're probably going to want to hand-roll an appropriate MCMC routine. Here is a good introduction to Monte Carlo and MCMC.

like image 57
jtobin Avatar answered Sep 28 '22 12:09

jtobin