Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Discrete rejection-sampling Monte Carlo with Python

I am using python to use the rejection-acceptance method to sample a discrete MC distribution. Since the curve resembles a power law, I decided to set a simple envelope around it (at x=77) to make the code faster. The code does not perform as expected, though, as it is shown for the figure with a simple rectangle over the entire area, compared to the envelope: here

x is data.rank, between 0-5000 y is data.freq

Can anyone spot anything wrong with the code? The output of both histograms should be equal. Thanks!

#!/usr/bin/env python                        

import numpy as np
import matplotlib.pyplot as plt
import random
import pandas as pd

# Read data
data = pd.read_csv('data.csv')

# Rejection-sampling MC (with envelope function gx at rank=77)
N = 1000
M = 1.0001
cutoff = 77
gx = np.ones(cutoff) * max(data['freq'])
gx = np.append( gx, np.ones(len(data['rank'])-cutoff) * data['freq'][cutoff-1] )
histx = []
while N > 0: 
    rx = random.randint(0,len(gx)-1)
    ry = random.uniform(0,1)
    if ry < data['freq'][rx]/(M*gx[rx]):
        histx.append(rx)
        N += -1
plt.hist(histx, bins=100, histtype='stepfilled', color='b',label='Enveloped (Fast)')

# Rejection-sampling MC (with envelope function gx at rank=77)
N2 = 1000
histy = []
while N2 > 0: 
    rx2 = random.randint(0,len(gx)-1)
    ry2 = random.uniform(0,max(data['freq']))
    if ry2 < data['freq'][rx2]:
        histy.append(rx2)
        N2 += -1
plt.hist(histy, bins=100, histtype='stepfilled', color='r', alpha=0.5, label='Normal (Slow)')
plt.legend()
plt.show()
like image 295
GermanoMosconi Avatar asked Jul 01 '26 16:07

GermanoMosconi


1 Answers

Those are different samplings

In the second case, you effectively have

ry2 = random.uniform(0.0, 1.0)
if ry2 < data['freq'][rx2]/max(data['freq']):
    accept

In the first case you have

ry = random.uniform(0.0, 1.0)
if ry < data['freq'][rx]/gx[rx]:
    accept

Where gx[rx] is not equal to max(data['freq']), you'll have a difference

Some advices: try to always use U(0,1) rng call, easier to spot, replace or alter rng. And second, in MC performance (well, after correctness) are of utmost importance, try to compute things like max(data['freq']) or len(gx) once outside the main loop

like image 152
Severin Pappadeux Avatar answered Jul 04 '26 06:07

Severin Pappadeux



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!