I'm trying to generate random masses for hypothetical planets in Haskell. I want to produce these masses by sampling a bi-modal distribution (ideally the superposition of two normal distributions: one corresponding to small planets and one corresponding to gas giants). I've looked at the statistics package, which provides the quantile
function, which can turn a uniformly distributed Double
into a Double
on a number of distributions. But there doesn't seem to be any support for composing distributions.
This particular case could be hacked around by picking one distribution or the other to sample before-hand, but I'd like to do it with a single distribution, especially since I might need to tweak the overall distribution later. Eventually I might replace the normal distribution with real data from sky surveys.
I'm considering implementing rejection sampling myself, which can handle arbitrary distributions fairly simply, but it seems rather inefficient, and it certainly wouldn't be a good idea to implement it if a solution exists already as a library.
Is there a Haskell library that supports sampling from composed or explicitly specified distributions? Or an existing Haskell implementation of rejection sampling? Alternatively, is there an explicit formula for the inverse of the CDF of the sum of two normal distributions?
In the case of a simple mixture of distributions, you can get an efficient sampler via the 'hack' you first mentioned:
This particular case could be hacked around by picking one distribution or the other to sample before-hand, but I'd like to do it with a single distribution, especially since I might need to tweak the overall distribution later.
This is actually a case of Gibbs sampling, which is very prevalent in statistics. It's very flexible, and if you know the number of mixtures you're using, it will probably be hard to beat. Choose one individual distribution from the entire ensemble to sample from, and then sample from that conditional distribution. Rinse and repeat.
Here's a simple, unoptimized Haskell implementation for a mixture-of-Gaussians Gibbs sampler. It's pretty basic, but you get the idea:
import System.Random
import Control.Monad.State
type ModeList = [(Double, Double)] -- A list of mean/stdev pairs, for each mode.
-- Generate a Gaussian (0, 1) variate.
boxMuller :: StdGen -> (Double, StdGen)
boxMuller gen = (sqrt (-2 * log u1) * cos (2 * pi * u2), gen'')
where (u1, gen') = randomR (0, 1) gen
(u2, gen'') = randomR (0, 1) gen'
sampler :: ModeList -> State StdGen Double
sampler modeInfo = do
gen <- get
let n = length modeInfo
(z0, g0) = boxMuller gen
(c, g1) = randomR (0, n - 1) g0 -- Sample from the components.
(cmu, csig) = modeInfo !! c
put g1
return $ cmu + csig * z0 -- Sample from the conditional distribution.
Here's a example run: sampling 100 times from a one-dimensional mixture of two Gaussians. The modes are at x = -3
and x = 2.5
, and each mixture component has its own separate variance. You could add as many modes as you want here.
main = do
let gen = mkStdGen 42
modeInfo = [(2.5, 1.0), (-3, 1.5)]
samples = (`evalState` gen) . replicateM 100 $ sampler modeInfo
print samples
Here's a smoothed density plot of those 100 samples (using R and ggplot2):
A more general purpose algorithm would be a rejection or importance sampler, and in the case of more complicated distributions you're probably going to want to hand-roll an appropriate MCMC routine. Here is a good introduction to Monte Carlo and MCMC.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With