Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accounting for errors when creating a histogram

I have a set of N observations distributed as (x[i], y[i]), i=0..N points in a 2D space. Each point has associated errors in both coordinates (e_x[i], e_y[i], i=0..N) and also a weight attached to it (w[i], i=0..N).

I'd like to generate a 2D histogram of these N points, accounting not only for the weights but also for the errors, which would cause each point to be spread possibly among many bins if the error values are large enough (assuming a standard Gaussian distribution for the errors, although other distributions could perhaps be considered).

I see that numpy.histogram2d has a weights parameter so that is taken care of. The issue would be how to account for the errors in each of the N observed points.

Is there a function that would let me do this? I'm open to anything in numpy and scipy.

like image 486
Gabriel Avatar asked Nov 11 '22 00:11

Gabriel


1 Answers

Building upon user1415946's comment, you can assume each point represents a bi-variate normal distribution with the covariance matrices given by [[e_x[i]**2,0][0,e_y[i]**2]]. However, the resulting distribution is not a normal distribution - you'll see, after running the example, how the histogram doesn't resemble a Gaussian at all, but instead a group of them.

To create a histogram out of this set of distributions, one way I see is by generating random samples out of each point using numpy.random.multivariate_normal. See the example code below with some artificial data.

import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt


# This is a function I like to use for plotting histograms
def plotHistogram3d(hist, xedges, yedges):
    fig = plt.figure()
    ax = fig.add_subplot(111, projection='3d')
    hist = hist.transpose()
    # Transposing is done so that bar3d x and y match hist shape correctly
    dx = np.mean(np.diff(xedges))
    dy = np.mean(np.diff(yedges))

    # Computing the number of elements
    elements = (len(xedges) - 1) * (len(yedges) - 1)
    # Generating mesh grids.
    xpos, ypos = np.meshgrid(xedges[:-1]+dx/2.0, yedges[:-1]+dy/2.0)

    # Vectorizing matrices
    xpos = xpos.flatten()
    ypos = ypos.flatten()
    zpos = np.zeros(elements)
    dx = dx * np.ones_like(zpos) * 0.5  # 0.5 factor to give room between bars.
# Use 1.0 if you want all bars 'glued' to each other
    dy = dy * np.ones_like(zpos) * 0.5
    dz = hist.flatten()

    ax.bar3d(xpos, ypos, zpos, dx, dy, dz, color='b')
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.set_zlabel('Count')
    return

"""
INPUT DATA
"""
#                 x  y ex ey  w
data = np.array([[1, 2, 1, 1, 1],
                 [3, 0, 1, 1, 2],
                 [0, 1, 2, 1, 5],
                 [7, 7, 1, 3, 1]])

"""
Generate samples
"""
# Sample size (100 samples will be generated for each data point)
SAMPLE_SIZE = 100
# I want to fill in a table with columns [x, y, w]. Each data point generates SAMPLE_SIZE
# samples, so we have SAMPLE_SIZE * (number of data points) generated points
points = np.zeros((SAMPLE_SIZE * data.shape[0], 3))  # Initializing this matrix

for i, element in enumerate(data):  # For each row in the data set
    meanVector = element[:2]
    covarianceMatrix = np.diag(element[2:4]**2)  # Diagonal matrix with elements equal to error^2
    # For columns 0 and 1, add generated x and y samples
    points[SAMPLE_SIZE*i:SAMPLE_SIZE*(i+1), :2] = \
        np.random.multivariate_normal(meanVector, covarianceMatrix, SAMPLE_SIZE)
    # For column 2, simply copy original weight
    points[SAMPLE_SIZE*i:SAMPLE_SIZE*(i+1), 2] = element[4]  # weights

hist, xedges, yedges = np.histogram2d(points[:, 0], points[:, 1], weights=points[:, 2])
plotHistogram3d(hist, xedges, yedges)
plt.show()

Results plotted below:

enter image description here

like image 103
Gabriel Gleizer Avatar answered Nov 14 '22 21:11

Gabriel Gleizer