Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Estimating the probability density of sum of uniform random variables in python

I have two random variables X and Y which are uniformly distributed on the simplex:simplex

I want to evaluate the density of their sum:

enter image description here

After evaluating the above integral, my final goal is to compute the following integral: enter image description here

To compute the first integral, I am generating uniformly distributed points in simplex and then checking if they belong to the desired region in the above integral and taking the fraction of points to evaluate the above density.

Once I compute the above density I am following a similar procedure to compute the above logarithm integral to compute its value. However, this has been extremely inefficient and taking a lot of time such 3-4 hours. Can anyone suggest me an efficient way to solve this in Python? I am using Numpy package.

Here is the code

import numpy as np
import math
import random
import numpy.random as nprnd
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
#This function checks if the point x lies the simplex and the negative simplex shifted by z
def InreqSumSimplex(x,z):
    dim=len(x)
    testShiftSimpl= all(z[i]-1 <= x[i] <= z[i] for i in range(0,dim)) and (sum(x) >= sum(z)-1)
    return int(testShiftSimpl)

def InreqDiffSimplex(x,z):
    dim=len(x)
    testShiftSimpl= all(z[i] <= x[i] <= z[i]+1 for i in range(0,dim)) and (sum(x) <= sum(z)+1)
    return int(testShiftSimpl)
#This is for the density X+Y
def DensityEvalSum(z,UniformCube):
    dim=len(z)
    Sum=0
    for gen in UniformCube:
        Exponential=[-math.log(i) for i in gen] #This is exponentially distributed
        x=[i/sum(Exponential) for i in Exponential[0:dim]] #x is now uniformly distributed on simplex

        Sum+=InreqSumSimplex(x,z)

    Sum=Sum/numsample

    FunVal=(math.factorial(dim))*Sum;
    if FunVal<0.00001:
        return 0.0
    else:
        return -math.log(FunVal)
#This is for the density X-Y
def DensityEvalDiff(z,UniformCube):
    dim=len(z)
    Sum=0
    for gen in UniformCube:
        Exponential=[-math.log(i) for i in gen]
        x=[i/sum(Exponential) for i in Exponential[0:dim]]

    Sum+=InreqDiffSimplex(x,z)

    Sum=Sum/numsample

    FunVal=(math.factorial(dim))*Sum;
    if FunVal<0.00001:
        return 0.0
    else:
        return -math.log(FunVal)
def EntropyRatio(dim):    
    UniformCube1=np.random.random((numsample,dim+1)); 
    UniformCube2=np.random.random((numsample,dim+1))

    IntegralSum=0; IntegralDiff=0

    for gen1,gen2 in zip(UniformCube1,UniformCube2):

        Expo1=[-math.log(i) for i in gen1];        Expo2=[-math.log(i) for i in gen2]

        Sumz=[ (i/sum(Expo1)) + j/sum(Expo2) for i,j in zip(Expo1[0:dim],Expo2[0:dim])] #Sumz is now disbtributed as X+Y

        Diffz=[ (i/sum(Expo1)) - j/sum(Expo2) for i,j in zip(Expo1[0:dim],Expo2[0:dim])] #Diffz is now distributed as X-Y

    UniformCube=np.random.random((numsample,dim+1))

    IntegralSum+=DensityEvalSum(Sumz,UniformCube) ; IntegralDiff+=DensityEvalDiff(Diffz,UniformCube)

    IntegralSum= IntegralSum/numsample; IntegralDiff=IntegralDiff/numsample

    return ( (IntegralDiff +math.log(math.factorial(dim)))/ ((IntegralSum +math.log(math.factorial(dim)))) )

Maxdim=11
dimlist=range(2,Maxdim)
Ratio=len(dimlist)*[0]
numsample=10000

for i in range(len(dimlist)):
    Ratio[i]=EntropyRatio(dimlist[i])
like image 271
pikachuchameleon Avatar asked Feb 14 '16 16:02

pikachuchameleon


1 Answers

Not sure it is an answer to your question, but let's start

First, here is some code samples and discussion how to properly sample from Dirichlet(n) (a.k.a. simplex), via gammavariate() or via -log(U) as you did but with proper handle for potential corner case, link

Problem with your code as I can see is that, say, for sampling dimension=2 simplex you're getting three (!) uniform numbers, but skipping one when doing list comprehension for x. This is wrong. To sample n-dimension Dirichlet you should get exactly n U(0,1) and transform then (or n samples from gammavariate).

But, best solution might be just use numpy.random.dirichlet(), it is written in C and might be fastest of all, see link.

Last one, in my humble opinion, you're not properly estimating log(PDF(X+Z)). Ok, you find some are, but what is PDF(X+Z) at this point?

Does this

testShiftSimpl= all(z[i]-1 <= x[i] <= z[i] for i in range(0,dim)) and (sum(x) >= sum(z)-1)
return int(testShiftSimpl)

looks like PDF? How did you managed to get it?

Simple test: integration of PDF(X+Z) over whole X+Z area. Did it produce 1?

UPDATE

Looks like we might have different ideas what we call simplex, Dirichlet etc. I'm pretty much along with this definition, where in d dim space we have d points and d-1 simplex is convex hull connecting vertices. Simplex dimension is always one less than space due to relation between coordinates. In simplest case, d=2, 1-simplex is a segment connecting points (1,0) and (0,1), and from Dirichlet distribution I've got the picture

enter image description here

In the case of d=3 and 2-simplex we have triangle connecting points (1,0,0), (0,1,0) and (0,0,1)

enter image description here

Code, Python

from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt

import math
import random

def simplex_sampling(d):
    """
    Sample one d-dim point from Dirichet distribution
    """
    r = []
    sum = 0.0

    for k in range(0, d):
        x = random.random()
        if x == 0.0:
            return make_corner_sample(d, k)

        t = -math.log(x)
        r.append(t)
        sum += t

    norm = 1.0 / sum

    for k in range(0, d):
        r[k] *= norm

    return r

def make_corner_sample(d, k):
    """
    U(0,1) number k is zero, it is a corner point in simplex
    """
    r = []
    for i in range(0, d):
        if i == k:
            r.append(1.0)
        else:
            r.append(0.0)

    return r

N = 500 # numer of points to plot
d = 3   # dimension of the space, 2 or 3

x = []
y = []
z = []

for k in range(0, N):
    pt = simplex_sampling(d)

    x.append(pt[0])
    y.append(pt[1])
    if d > 2:
        z.append(pt[2])

if d == 2:
    plt.scatter(x, y, alpha=0.1)
else:
    fig = plt.figure()
    ax  = fig.add_subplot(111, projection='3d')
    ax.scatter(x, y, z, alpha=0.1)

    ax.set_xlabel('X Label')
    ax.set_ylabel('Y Label')
    ax.set_zlabel('Z Label')

plt.show()
like image 111
Severin Pappadeux Avatar answered Nov 12 '22 03:11

Severin Pappadeux