I have a sorted vector points
with 100 points. I now want to create two histograms: the first histogram should have 10 bins having equal width. The second should also have 10 histograms, but not necessarily of equal width. In the second, I just want the histogram to have the same number of points in each bin. So for example, the first bar might be very short and wide, while the second bar in the histogram might be very tall and narrow. I have code that creates the first histogram using matplotlib
, but now I'm not sure how to go about creating the second one.
import matplotlib.pyplot as plt
points = [1,2,3,4,5,6, ..., 99]
n, bins, patches = plt.hist(points, 10)
Edit:
Trying the solution below, I'm a bit puzzled as to why the heights of all of the bars in my histogram are the same.
However, bins need not be of equal width; in that case, the erected rectangle is defined to have its area proportional to the frequency of cases in the bin. The vertical axis is then not the frequency but frequency density—the number of cases per unit of the variable on the horizontal axis.
If you have too many bins, then the data distribution will look rough, and it will be difficult to discern the signal from the noise. On the other hand, with too few bins, the histogram will lack the details needed to discern any useful pattern from the data.
The bin-width is set to h=2×IQR×n−1/3. So the number of bins is (max−min)/h, where n is the number of observations, max is the maximum value and min is the minimum value.
This question is similar to one that I wrote an answer to a while back, but sufficiently different to warrant it's own question. The solution, it turns out, uses basically the same code from my other answer.
def histedges_equalN(x, nbin):
npt = len(x)
return np.interp(np.linspace(0, npt, nbin + 1),
np.arange(npt),
np.sort(x))
x = np.random.randn(100)
n, bins, patches = plt.hist(x, histedges_equalN(x, 10))
This solution gives a histogram with equal height bins, because---by definition---a histogram is a count of the number of points in each bin.
To get a pdf (i.e. density function) use the normed=True
kwarg to plt.hist. As described in my other answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With