Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error on weighted histogram in python

I want to calculate the error on a bin height by taking the square root of the sum of the weights squared (sumw2) in that bin (poission error). Is there any way to get the sum of weights (sumw) and/or sumw2 when histogramming data with either matplotlib or numpy (or any other library for that matter)?

Let's say I have some data in a numpy array x and some weights w in another numpy array, to get the histogram I would either so

n, bins, patches = pyplot.hist(x,weights=w)

or

n, bins = numpy.histogram(x,weights=w)

In both cases I have no clue which entries of w belong to which bin right?

Edit: Currently I'm using YODA to do this. The disadvantage from my point of view is that YODA histograms can only be filled with one data point at a time.

like image 281
Lxndr Avatar asked Oct 11 '25 09:10

Lxndr


2 Answers

Consider an array x with weights w. The histogram of the data in x weighted by w with bins is given by:

n, bins = np.histogram(x, bins=bins, weights=w)

And the associated errors to n can be computed as:

n_err = np.sqrt(np.histogram(x, bins=bins, weights=w**2)[0])

Note that if the data is not weighted (i.e. (w == 1).all()) then the error reduces to the "standard" np.sqrt(n)

like image 64
Cyril23 Avatar answered Oct 13 '25 21:10

Cyril23


According to numpy documentation, weights

An array of weights, of the same shape as a. Each value in a only contributes its associated weight towards the bin count (instead of 1). If density is True, the weights are normalized, so that the integral of the density over the range remains 1.

That means that each value in w should be associated with a value in x. If you'd like to weight bins and plot them, you can first find bins' values, multiply them by weights and finally plot them using bar.

val, pos = np.histogram(np.arange(1000))
w_val = val * w
plt.bar(pos[1:], w_val)


Update from the comment:

Ahh, sorry, it seems that I didn't understand the initial question. Actually, you can use pos to find cells related to each bin and calculate your weight function using these information.

for left, right in zip(pos, pos[1:): 
    ix = np.where((x >= left) & (x <= right))[0] 
    sumw2 = np.sum(w[ix] ** 2) 
like image 27
Tural Gurbanov Avatar answered Oct 13 '25 21:10

Tural Gurbanov



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!