I have a set of N
observations distributed as (x[i], y[i]), i=0..N
points in a 2D space. Each point has associated errors in both coordinates (e_x[i], e_y[i], i=0..N
) and also a weight attached to it (w[i], i=0..N
).
I'd like to generate a 2D histogram of these N
points, accounting not only for the weights but also for the errors, which would cause each point to be spread possibly among many bins if the error values are large enough (assuming a standard Gaussian distribution for the errors, although other distributions could perhaps be considered).
I see that numpy.histogram2d has a weights
parameter so that is taken care of. The issue would be how to account for the errors in each of the N
observed points.
Is there a function that would let me do this? I'm open to anything in numpy
and scipy
.
Building upon user1415946's comment, you can assume each point represents a bi-variate normal distribution with the covariance matrices given by [[e_x[i]**2,0][0,e_y[i]**2]]
. However, the resulting distribution is not a normal distribution - you'll see, after running the example, how the histogram doesn't resemble a Gaussian at all, but instead a group of them.
To create a histogram out of this set of distributions, one way I see is by generating random samples out of each point using numpy.random.multivariate_normal. See the example code below with some artificial data.
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
# This is a function I like to use for plotting histograms
def plotHistogram3d(hist, xedges, yedges):
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
hist = hist.transpose()
# Transposing is done so that bar3d x and y match hist shape correctly
dx = np.mean(np.diff(xedges))
dy = np.mean(np.diff(yedges))
# Computing the number of elements
elements = (len(xedges) - 1) * (len(yedges) - 1)
# Generating mesh grids.
xpos, ypos = np.meshgrid(xedges[:-1]+dx/2.0, yedges[:-1]+dy/2.0)
# Vectorizing matrices
xpos = xpos.flatten()
ypos = ypos.flatten()
zpos = np.zeros(elements)
dx = dx * np.ones_like(zpos) * 0.5 # 0.5 factor to give room between bars.
# Use 1.0 if you want all bars 'glued' to each other
dy = dy * np.ones_like(zpos) * 0.5
dz = hist.flatten()
ax.bar3d(xpos, ypos, zpos, dx, dy, dz, color='b')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('Count')
return
"""
INPUT DATA
"""
# x y ex ey w
data = np.array([[1, 2, 1, 1, 1],
[3, 0, 1, 1, 2],
[0, 1, 2, 1, 5],
[7, 7, 1, 3, 1]])
"""
Generate samples
"""
# Sample size (100 samples will be generated for each data point)
SAMPLE_SIZE = 100
# I want to fill in a table with columns [x, y, w]. Each data point generates SAMPLE_SIZE
# samples, so we have SAMPLE_SIZE * (number of data points) generated points
points = np.zeros((SAMPLE_SIZE * data.shape[0], 3)) # Initializing this matrix
for i, element in enumerate(data): # For each row in the data set
meanVector = element[:2]
covarianceMatrix = np.diag(element[2:4]**2) # Diagonal matrix with elements equal to error^2
# For columns 0 and 1, add generated x and y samples
points[SAMPLE_SIZE*i:SAMPLE_SIZE*(i+1), :2] = \
np.random.multivariate_normal(meanVector, covarianceMatrix, SAMPLE_SIZE)
# For column 2, simply copy original weight
points[SAMPLE_SIZE*i:SAMPLE_SIZE*(i+1), 2] = element[4] # weights
hist, xedges, yedges = np.histogram2d(points[:, 0], points[:, 1], weights=points[:, 2])
plotHistogram3d(hist, xedges, yedges)
plt.show()
Results plotted below:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With