Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Making a hexagonal plot with elements being weighted in python

Using matplotib in python it is possible to make a simple histogram by providing the list of items to be plotted together with a list of weights, such that the contribution of each item to the bin to which it belongs is adjusted according to its weight, e.g.

import matplotlib.pyplot as plt

...

plt.hist(items, weights = weightsOfItems)

I am trying to plot a hexagonal bin histogram of two values against each other, which can be done using

plt.hexbin(xValues, yValues)

As before, I would like the contributions of each pair to the bin to which it belongs to be adjusted according to a list of weights. From the hexbin documentation it seems like I should be able to do this by giving an input for the parameter C, i.e.

plt.hexbin(xValues, yValues, C = weightsOfValues)

Doing this, however, yields completely incorrect plots. For the time being I have resorted to first sampling my xValues and yValues according to the weights to give xSamples and ySamples. This process, however, is very time consuming and also means that I do not use all of the data available since I get rid of xValues and yValues not included in the samples.

So, does anyone know of a way to produce a hexagonal bin histogram where the contribution of values to their respective bins is adjusted for according to given weights?

like image 326
Reza Rohani Avatar asked Oct 29 '22 23:10

Reza Rohani


1 Answers

According to the documentation:

If C is specified, it specifies values at the coordinate (x[i],y[i]). These values are accumulated for each hexagonal bin and then reduced according to reduce_C_function, which defaults to numpy’s mean function (np.mean). (If C is specified, it must also be a 1-D sequence of the same length as x and y.)

This means that for each bin, the corresponding C values are stored and then the reduce_C_function is applied to them. As the default function is np.mean, the result is not what you want to obtain but the mean instead of the sum. To actually get the sum of all the weights, this should be changed to np.sum in order to sum the values of C for each (x,y) in the bin instead of averaging them.

This example shows the difference with simple data:

N = 10**5
x = np.random.normal(size=N)
y = np.random.normal(size=N)
plt.figure(figsize=(12,4)); plt.subplot(131)
plt.hexbin(x,y); plt.colorbar()
plt.subplot(132)
plt.hexbin(x,y,C=np.ones(N)); plt.colorbar()
plt.subplot(133)
plt.hexbin(x,y,C=np.ones(N),reduce_C_function=np.sum)
plt.colorbar(); plt.tight_layout()

Now the weights are set to 1 for all the values (generated from a Gaussian distribution), so that a proper algorithm for weighted histograms should return the same as the unweighted histogram. The output are the following plots:

Different reduce_C_function

The left panel is an unweighted plot, showing the 2D Gaussian, the middle panel is the default behaviour of C, averaging all C values per bin, thus, having a count of 1 in all the bins, and the right panel is the behaviour with np.sum, where the 2D Gaussian is retrieved.

like image 55
OriolAbril Avatar answered Nov 15 '22 06:11

OriolAbril