Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

find mean bin values using histogram2d python [duplicate]

How do you calculate the mean values for bins with a 2D histogram in python? I have temperature ranges for the x and y axis and I am trying to plot the probability of lightning using bins for the respective temperatures. I am reading in the data from a csv file and my code is such:

filename = 'Random_Events_All_Sorted_85GHz.csv'
df = pd.read_csv(filename)

min37 = df.min37
min85 = df.min85
verification = df.five_min_1

#Numbers
x = min85
y = min37
H = verification

#Estimate the 2D histogram
nbins = 4
H, xedges, yedges = np.histogram2d(x,y,bins=nbins)

#Rotate and flip H
H = np.rot90(H) 
H = np.flipud(H)

#Mask zeros
Hmasked = np.ma.masked_where(H==0,H)

#Plot 2D histogram using pcolor
fig1 = plt.figure()
plt.pcolormesh(xedges,yedges,Hmasked)
plt.xlabel('min 85 GHz PCT (K)')
plt.ylabel('min 37 GHz PCT (K)')
cbar = plt.colorbar()
cbar.ax.set_ylabel('Probability of Lightning (%)')

plt.show()

This makes a nice looking plot, but the data that is plotted is the count, or number of samples that fall into each bin. The verification variable is an array that contains 1's and 0's, where a 1 indicates lightning and a 0 indicates no lightning. I want the data in the plot to be the probability of lightning for a given bin based on the data from the verification variable - thus I need bin_mean*100 in order to get this percentage.

I tried using an approach similar to what is shown here (binning data in python with scipy/numpy), but I was having difficulty getting it to work for a 2D histogram.

like image 984
mbreezy Avatar asked Jul 23 '14 18:07

mbreezy


2 Answers

There is an elegant and fast way to do this! Use weights parameter to sum values:

denominator, xedges, yedges = np.histogram2d(x,y,bins=nbins)
nominator, _, _ = np.histogram2d(x,y,bins=[xedges, yedges], weights=verification)

So all you need is to divide in each bin the sum of values by the number of events:

result = nominator / denominator.clip(1)

Voila!

like image 55
Alleo Avatar answered Sep 16 '22 13:09

Alleo


This is doable at least with the following method

# xedges, yedges as returned by 'histogram2d'

# create an array for the output quantities
avgarr = np.zeros((nbins, nbins))

# determine the X and Y bins each sample coordinate belongs to
xbins = np.digitize(x, xedges[1:-1])
ybins = np.digitize(y, yedges[1:-1])

# calculate the bin sums (note, if you have very many samples, this is more
# effective by using 'bincount', but it requires some index arithmetics
for xb, yb, v in zip(xbins, ybins, verification):
    avgarr[yb, xb] += v

# replace 0s in H by NaNs (remove divide-by-zero complaints)
# if you do not have any further use for H after plotting, the
# copy operation is unnecessary, and this will the also take care
# of the masking (NaNs are plotted transparent)
divisor = H.copy()
divisor[divisor==0.0] = np.nan

# calculate the average
avgarr /= divisor

# now 'avgarr' contains the averages (NaNs for no-sample bins)

If you know the bin edges beforehand, you can do the histogram part in the same just by adding one row.

like image 26
DrV Avatar answered Sep 16 '22 13:09

DrV