I'm generating some histograms with matplotlib and I'm having some trouble figuring out how to get the xticks of a histogram to align with the bars. Here's a sample of the code I use to generate the histogram: <pre class="prettyprint"><code>from matplotlib import pyplot as py py.hist(histogram_data, 49, alpha=0.75) py.title(column_name) py.xticks(range(49)) py.show() </code></pre> I know that all of values in the <code>histogram_data</code> array are in <code>[0,1,...,48]</code>. Which, assuming I did the math right, means there are 49 unique values. I'd like to show a histogram of each of those values. Here's a picture of what's generated. <img src="https://i.stack.imgur.com/93Sp0.png" alt="testing"> How can I set up the graph such that all of the xticks are aligned to the left, middle or right of each of the bars?

Short answer: Use <code>plt.hist(data, bins=range(50))</code> instead to get left-aligned bins, <code>plt.hist(data, bins=np.arange(50)-0.5)</code> to get center-aligned bins, etc. Also, if performance matters, because you want counts of unique integers, there are a couple of slightly more efficient methods (<code>np.bincount</code>) that I'll show at the end. <h3>Problem Statement</h3> <hr> As a stand-alone example of what you're seeing, consider the following: <pre class="prettyprint"><code>import matplotlib.pyplot as plt import numpy as np # Generate a random array of integers between 0-9 # data.min() will be 0 and data.max() will be 9 (not 10) data = np.random.randint(0, 10, 1000) plt.hist(data, bins=10) plt.xticks(range(10)) plt.show() </code></pre> <img src="https://i.stack.imgur.com/c6IC8.png" alt="enter image description here"> As you've noticed, the bins aren't aligned with integer intervals. This is basically because you asked for 10 bins between 0 and 9, which isn't quite the same as asking for bins for the 10 unique values. The number of bins you want isn't exactly the same as the number of unique values. What you actually should do in this case is manually specify the bin edges. To explain what's going on, let's skip <code>matplotlib.pyplot.hist</code> and just use the underlying <code>numpy.histogram</code> function. For example, let's say you have the values <code>[0, 1, 2, 3]</code>. Your first instinct would be to do: <pre class="prettyprint"><code>In [1]: import numpy as np In [2]: np.histogram([0, 1, 2, 3], bins=4) Out[2]: (array([1, 1, 1, 1]), array([ 0. , 0.75, 1.5 , 2.25, 3. ])) </code></pre> The first array returned is the counts and the second is the bin edges (in other words, where bar edges would be in your plot). Notice that we get the counts we'd expect, but because we asked for 4 bins between the min and max of the data, the bin edges aren't on integer values. Next, you might try: <pre class="prettyprint"><code>In [3]: np.histogram([0, 1, 2, 3], bins=3) Out[3]: (array([1, 1, 2]), array([ 0., 1., 2., 3.])) </code></pre> Note that the bin edges (the second array) are what you were expecting, but the counts aren't. That's because the last bin behaves differently than the others, as noted in the documentation for <code>numpy.histogram</code>: <pre class="prettyprint"><code>Notes ----- All but the last (righthand-most) bin is half-open. In other words, if `bins` is:: [1, 2, 3, 4] then the first bin is ``[1, 2)`` (including 1, but excluding 2) and the second ``[2, 3)``. The last bin, however, is ``[3, 4]``, which *includes* 4. </code></pre> Therefore, what you actually should do is specify exactly what bin edges you want, and either include one beyond your last data point or shift the bin edges to the <code>0.5</code> intervals. For example: <pre class="prettyprint"><code>In [4]: np.histogram([0, 1, 2, 3], bins=range(5)) Out[4]: (array([1, 1, 1, 1]), array([0, 1, 2, 3, 4])) </code></pre> <h3>Bin Alignment</h3> <hr> Now let's apply this to the first example and see what it looks like: <pre class="prettyprint"><code>import matplotlib.pyplot as plt import numpy as np # Generate a random array of integers between 0-9 # data.min() will be 0 and data.max() will be 9 (not 10) data = np.random.randint(0, 10, 1000) plt.hist(data, bins=range(11)) # <- The only difference plt.xticks(range(10)) plt.show() </code></pre> <img src="https://i.stack.imgur.com/LiThl.png" alt="enter image description here"> Okay, great! However, we now effectively have left-aligned bins. What if we wanted center-aligned bins to better reflect the fact that these are unique values? The quick way is to just shift the bin edges: <pre class="prettyprint"><code>import matplotlib.pyplot as plt import numpy as np # Generate a random array of integers between 0-9 # data.min() will be 0 and data.max() will be 9 (not 10) data = np.random.randint(0, 10, 1000) bins = np.arange(11) - 0.5 plt.hist(data, bins) plt.xticks(range(10)) plt.xlim([-1, 10]) plt.show() </code></pre> <img src="https://i.stack.imgur.com/e1cNM.png" alt="enter image description here"> Similarly for right-aligned bins, just shift by <code>-1</code>. <h3>Another approach</h3> <hr> For the particular case of unique integer values, there's another, more efficient approach we can take. If you're dealing with unique integer counts starting with 0, you're better off using <code>numpy.bincount</code> than using <code>numpy.hist</code>. For example: <pre class="prettyprint"><code>import matplotlib.pyplot as plt import numpy as np data = np.random.randint(0, 10, 1000) counts = np.bincount(data) # Switching to the OO-interface. You can do all of this with "plt" as well. fig, ax = plt.subplots() ax.bar(range(10), counts, width=1, align='center') ax.set(xticks=range(10), xlim=[-1, 10]) plt.show() </code></pre> <img src="https://i.stack.imgur.com/rcC0C.png" alt="enter image description here"> There are two big advantages to this approach. One is speed. <code>numpy.histogram</code> (and therefore <code>plt.hist</code>) basically runs the data through <code>numpy.digitize</code> and then <code>numpy.bincount</code>. Because you're dealing with unique integer values, there's no need to take the <code>numpy.digitize</code> step. However, the bigger advantage is more control over display. If you'd prefer thinner rectangles, just use a smaller width: <pre class="prettyprint"><code>import matplotlib.pyplot as plt import numpy as np data = np.random.randint(0, 10, 1000) counts = np.bincount(data) # Switching to the OO-interface. You can do all of this with "plt" as well. fig, ax = plt.subplots() ax.bar(range(10), counts, width=0.8, align='center') ax.set(xticks=range(10), xlim=[-1, 10]) plt.show() </code></pre> <img src="https://i.stack.imgur.com/1Sd8V.png" alt="enter image description here">

Matplotlib xticks not lining up with histogram

Tags:

python

matplotlib

I'm generating some histograms with matplotlib and I'm having some trouble figuring out how to get the xticks of a histogram to align with the bars.

Here's a sample of the code I use to generate the histogram:

from matplotlib import pyplot as py  py.hist(histogram_data, 49, alpha=0.75) py.title(column_name) py.xticks(range(49)) py.show()

I know that all of values in the histogram_data array are in [0,1,...,48]. Which, assuming I did the math right, means there are 49 unique values. I'd like to show a histogram of each of those values. Here's a picture of what's generated.

testing

How can I set up the graph such that all of the xticks are aligned to the left, middle or right of each of the bars?

926

asked Nov 22 '14 22:11

Paymahn Moghadasian

1 Answers

Short answer: Use plt.hist(data, bins=range(50)) instead to get left-aligned bins, plt.hist(data, bins=np.arange(50)-0.5) to get center-aligned bins, etc.

Also, if performance matters, because you want counts of unique integers, there are a couple of slightly more efficient methods (np.bincount) that I'll show at the end.

Problem Statement

As a stand-alone example of what you're seeing, consider the following:

import matplotlib.pyplot as plt import numpy as np  # Generate a random array of integers between 0-9 # data.min() will be 0 and data.max() will be 9 (not 10) data = np.random.randint(0, 10, 1000)  plt.hist(data, bins=10) plt.xticks(range(10)) plt.show()

enter image description here

As you've noticed, the bins aren't aligned with integer intervals. This is basically because you asked for 10 bins between 0 and 9, which isn't quite the same as asking for bins for the 10 unique values.

The number of bins you want isn't exactly the same as the number of unique values. What you actually should do in this case is manually specify the bin edges.

To explain what's going on, let's skip matplotlib.pyplot.hist and just use the underlying numpy.histogram function.

For example, let's say you have the values [0, 1, 2, 3]. Your first instinct would be to do:

In [1]: import numpy as np  In [2]: np.histogram([0, 1, 2, 3], bins=4) Out[2]: (array([1, 1, 1, 1]), array([ 0.  ,  0.75,  1.5 ,  2.25,  3.  ]))

The first array returned is the counts and the second is the bin edges (in other words, where bar edges would be in your plot).

Notice that we get the counts we'd expect, but because we asked for 4 bins between the min and max of the data, the bin edges aren't on integer values.

Next, you might try:

In [3]: np.histogram([0, 1, 2, 3], bins=3) Out[3]: (array([1, 1, 2]), array([ 0.,  1.,  2.,  3.]))

Note that the bin edges (the second array) are what you were expecting, but the counts aren't. That's because the last bin behaves differently than the others, as noted in the documentation for numpy.histogram:

Notes ----- All but the last (righthand-most) bin is half-open.  In other words, if `bins` is::    [1, 2, 3, 4]  then the first bin is ``[1, 2)`` (including 1, but excluding 2) and the second ``[2, 3)``.  The last bin, however, is ``[3, 4]``, which *includes* 4.

Therefore, what you actually should do is specify exactly what bin edges you want, and either include one beyond your last data point or shift the bin edges to the 0.5 intervals. For example:

In [4]: np.histogram([0, 1, 2, 3], bins=range(5)) Out[4]: (array([1, 1, 1, 1]), array([0, 1, 2, 3, 4]))

Bin Alignment

Now let's apply this to the first example and see what it looks like:

import matplotlib.pyplot as plt import numpy as np  # Generate a random array of integers between 0-9 # data.min() will be 0 and data.max() will be 9 (not 10) data = np.random.randint(0, 10, 1000)  plt.hist(data, bins=range(11)) # <- The only difference plt.xticks(range(10)) plt.show()

enter image description here

Okay, great! However, we now effectively have left-aligned bins. What if we wanted center-aligned bins to better reflect the fact that these are unique values?

The quick way is to just shift the bin edges:

import matplotlib.pyplot as plt import numpy as np  # Generate a random array of integers between 0-9 # data.min() will be 0 and data.max() will be 9 (not 10) data = np.random.randint(0, 10, 1000)  bins = np.arange(11) - 0.5 plt.hist(data, bins) plt.xticks(range(10)) plt.xlim([-1, 10])  plt.show()

enter image description here

Similarly for right-aligned bins, just shift by -1.

Another approach

For the particular case of unique integer values, there's another, more efficient approach we can take.

If you're dealing with unique integer counts starting with 0, you're better off using numpy.bincount than using numpy.hist.

For example:

import matplotlib.pyplot as plt import numpy as np  data = np.random.randint(0, 10, 1000) counts = np.bincount(data)  # Switching to the OO-interface. You can do all of this with "plt" as well. fig, ax = plt.subplots() ax.bar(range(10), counts, width=1, align='center') ax.set(xticks=range(10), xlim=[-1, 10])  plt.show()

enter image description here

There are two big advantages to this approach. One is speed. numpy.histogram (and therefore plt.hist) basically runs the data through numpy.digitize and then numpy.bincount. Because you're dealing with unique integer values, there's no need to take the numpy.digitize step.

However, the bigger advantage is more control over display. If you'd prefer thinner rectangles, just use a smaller width:

import matplotlib.pyplot as plt import numpy as np  data = np.random.randint(0, 10, 1000) counts = np.bincount(data)  # Switching to the OO-interface. You can do all of this with "plt" as well. fig, ax = plt.subplots() ax.bar(range(10), counts, width=0.8, align='center') ax.set(xticks=range(10), xlim=[-1, 10])  plt.show()

enter image description here

117

answered Sep 30 '22 20:09

Joe Kington

Related questions
                            
                                OpenCV - Apply mask to a color image
                            
                                Using Python's list index() method on a list of tuples or objects?
                            
                                Multiple assignment and evaluation order in Python
                            
                                Detect whether a Python string is a number or a letter [duplicate]
                            
                                How to switch to new window in Selenium for Python?
                            
                                How to install a Python module via its setup.py in Windows? [closed]
                            
                                Correlation heatmap
                            
                                How to determine file, function and line number?
                            
                                Nested list comprehension with two lists
                            
                                export notebook to pdf without code [duplicate]
                            
                                Reading two text files line by line simultaneously
                            
                                How to convert column with string type to int form in pyspark data frame?
                            
                                Identify the changed fields in django post_save signal
                            
                                ValueError: Unknown label type: 'unknown'
                            
                                asyncio.run() cannot be called from a running event loop
                            
                                Using Python and BeautifulSoup (saved webpage source codes into a local file)
                            
                                Multivariate normal density in Python?
                            
                                How to get the PYTHONPATH in shell?
                            
                                How to make file creation an atomic operation?
                            
                                Using python Requests with javascript pages

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Matplotlib xticks not lining up with histogram

Tags:

python

matplotlib

Paymahn Moghadasian

People also ask

1 Answers

Problem Statement

Bin Alignment

Another approach

Joe Kington

Recent Activity

Donate For Us