I have two random variables X and Y which are uniformly distributed on the simplex:
I want to evaluate the density of their sum:
After evaluating the above integral, my final goal is to compute the following integral:
To compute the first integral, I am generating uniformly distributed points in simplex and then checking if they belong to the desired region in the above integral and taking the fraction of points to evaluate the above density.
Once I compute the above density I am following a similar procedure to compute the above logarithm integral to compute its value. However, this has been extremely inefficient and taking a lot of time such 3-4 hours. Can anyone suggest me an efficient way to solve this in Python? I am using Numpy package.
Here is the code
import numpy as np
import math
import random
import numpy.random as nprnd
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
#This function checks if the point x lies the simplex and the negative simplex shifted by z
def InreqSumSimplex(x,z):
dim=len(x)
testShiftSimpl= all(z[i]-1 <= x[i] <= z[i] for i in range(0,dim)) and (sum(x) >= sum(z)-1)
return int(testShiftSimpl)
def InreqDiffSimplex(x,z):
dim=len(x)
testShiftSimpl= all(z[i] <= x[i] <= z[i]+1 for i in range(0,dim)) and (sum(x) <= sum(z)+1)
return int(testShiftSimpl)
#This is for the density X+Y
def DensityEvalSum(z,UniformCube):
dim=len(z)
Sum=0
for gen in UniformCube:
Exponential=[-math.log(i) for i in gen] #This is exponentially distributed
x=[i/sum(Exponential) for i in Exponential[0:dim]] #x is now uniformly distributed on simplex
Sum+=InreqSumSimplex(x,z)
Sum=Sum/numsample
FunVal=(math.factorial(dim))*Sum;
if FunVal<0.00001:
return 0.0
else:
return -math.log(FunVal)
#This is for the density X-Y
def DensityEvalDiff(z,UniformCube):
dim=len(z)
Sum=0
for gen in UniformCube:
Exponential=[-math.log(i) for i in gen]
x=[i/sum(Exponential) for i in Exponential[0:dim]]
Sum+=InreqDiffSimplex(x,z)
Sum=Sum/numsample
FunVal=(math.factorial(dim))*Sum;
if FunVal<0.00001:
return 0.0
else:
return -math.log(FunVal)
def EntropyRatio(dim):
UniformCube1=np.random.random((numsample,dim+1));
UniformCube2=np.random.random((numsample,dim+1))
IntegralSum=0; IntegralDiff=0
for gen1,gen2 in zip(UniformCube1,UniformCube2):
Expo1=[-math.log(i) for i in gen1]; Expo2=[-math.log(i) for i in gen2]
Sumz=[ (i/sum(Expo1)) + j/sum(Expo2) for i,j in zip(Expo1[0:dim],Expo2[0:dim])] #Sumz is now disbtributed as X+Y
Diffz=[ (i/sum(Expo1)) - j/sum(Expo2) for i,j in zip(Expo1[0:dim],Expo2[0:dim])] #Diffz is now distributed as X-Y
UniformCube=np.random.random((numsample,dim+1))
IntegralSum+=DensityEvalSum(Sumz,UniformCube) ; IntegralDiff+=DensityEvalDiff(Diffz,UniformCube)
IntegralSum= IntegralSum/numsample; IntegralDiff=IntegralDiff/numsample
return ( (IntegralDiff +math.log(math.factorial(dim)))/ ((IntegralSum +math.log(math.factorial(dim)))) )
Maxdim=11
dimlist=range(2,Maxdim)
Ratio=len(dimlist)*[0]
numsample=10000
for i in range(len(dimlist)):
Ratio[i]=EntropyRatio(dimlist[i])
Not sure it is an answer to your question, but let's start
First, here is some code samples and discussion how to properly sample from Dirichlet(n) (a.k.a. simplex), via gammavariate()
or via -log(U)
as you did but with proper handle for potential corner case, link
Problem with your code as I can see is that, say, for sampling dimension=2 simplex
you're getting three (!) uniform numbers, but skipping one when doing list comprehension for x
. This is wrong. To sample n-dimension Dirichlet you should get exactly n
U(0,1) and transform then (or n
samples from gammavariate).
But, best solution might be just use numpy.random.dirichlet()
, it is written in C and might be fastest of all, see link.
Last one, in my humble opinion, you're not properly estimating log(PDF(X+Z))
. Ok, you find some are, but what is PDF(X+Z)
at this point?
Does this
testShiftSimpl= all(z[i]-1 <= x[i] <= z[i] for i in range(0,dim)) and (sum(x) >= sum(z)-1)
return int(testShiftSimpl)
looks like PDF? How did you managed to get it?
Simple test: integration of PDF(X+Z)
over whole X+Z
area. Did it produce 1?
UPDATE
Looks like we might have different ideas what we call simplex, Dirichlet etc. I'm pretty much along with this definition, where in d
dim space we have d
points and d-1
simplex is convex hull connecting vertices. Simplex dimension is always
one less than space due to relation between coordinates. In simplest case, d
=2, 1-simplex is a segment connecting points (1,0) and (0,1), and from Dirichlet distribution I've got the picture
In the case of d
=3 and 2-simplex we have triangle connecting points (1,0,0), (0,1,0) and (0,0,1)
Code, Python
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import math
import random
def simplex_sampling(d):
"""
Sample one d-dim point from Dirichet distribution
"""
r = []
sum = 0.0
for k in range(0, d):
x = random.random()
if x == 0.0:
return make_corner_sample(d, k)
t = -math.log(x)
r.append(t)
sum += t
norm = 1.0 / sum
for k in range(0, d):
r[k] *= norm
return r
def make_corner_sample(d, k):
"""
U(0,1) number k is zero, it is a corner point in simplex
"""
r = []
for i in range(0, d):
if i == k:
r.append(1.0)
else:
r.append(0.0)
return r
N = 500 # numer of points to plot
d = 3 # dimension of the space, 2 or 3
x = []
y = []
z = []
for k in range(0, N):
pt = simplex_sampling(d)
x.append(pt[0])
y.append(pt[1])
if d > 2:
z.append(pt[2])
if d == 2:
plt.scatter(x, y, alpha=0.1)
else:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x, y, z, alpha=0.1)
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')
plt.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With