Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is the pyplot histogram bins interpreted?

I am confused about the matplotlib hist function.

The documentation explains:

If a sequence of values, the values of the lower bound of the bins to be used.

But when I have two values in sequence i.e [0,1], I only get 1 bin. And when I have three like so:

plt.hist(votes, bins=[0,1,2], normed=True)

I only get two bins. My guess is that the last value is just an upper bound for the last bin.

Is there a way to have "the rest" of the values in the last bin, other than to but a very big value there? (or in other words, without making that bin much bigger than the others)

It seems like the last bin value is included in the last bin

votes = [0,0,1,2]
plt.hist(votes, bins=[0,1])

This gives me one bin of height 3. i.e. 0,0,1. While:

votes = [0,0,1,2]
plt.hist(votes, bins=[0,1,2])

Gives me two bins with two in each. I find this counter intuative, that adding a new bin changes the widthlimits of the others.

votes = [0,0,1]
plit.hist[votes, bins=2) 

yeilds two bins size 2 and 1. These seems to have been split on 0,5 since the x-axis goes from 0 to 1.

How should the bins array be interpreted? How is the data split?

like image 572
Christopher Käck Avatar asked Mar 02 '13 17:03

Christopher Käck


1 Answers

votes = [0, 0, 1, 2]
plt.hist(votes, bins=[0,1])

this gives you one bin of height 3, because it splits the data into one single bin with the interval: [0, 1]. It puts into that bin the values: 0, 0, and 1.

votes = [0, 0, 1, 2]
plt.hist(votes, bins=[0, 1, 2])

this gives you an histogram with bins with intervals: [0, 1[ and [1, 2]; so you have 2 items in the 1st bin (the 0 and 0), and 2 items in the 2nd bin (the 1 and 2).

If you try to plot:

plt.hist(votes, bins=[0, 1, 2, 3])

the idea behind the data splitting into bins is the same: you will get three intervals: [0, 1[; [1, 2[; [2, 3], and you will notice that the value 2 changes its bin, going to the bin with interval [2, 3] (instead of staying in the bin [1, 2] as in the previous example).

In conclusion, if you have an ordered array in the bins argument like: [i_0, i_1, i_2, i_3, i_4, ..., i_n] that will create the bins:
[i_0, i_1[
[i_1, i_2[
[i_2, i_3[
[i_3, i_4[
...
[i_(n-1), i_n]

with the boundaries of each open or closed according to the brackets.

like image 113
sissi_luaty Avatar answered Oct 31 '22 18:10

sissi_luaty