Bin size in Matplotlib (Histogram)

People also ask

What is bin size histogram?

A histogram displays numerical data by grouping data into "bins" of equal width. Each bin is plotted as a bar whose height corresponds to how many data points are in that bin. Bins are also sometimes called "intervals", "classes", or "buckets".

How does bin size affect histogram?

1 Answer. The bin width (and thus number of categories or ranges) affects the ability of a histogram to identify local regions of higher incidence. Too large, and you will not get enough differentiation. Too small, and the data cannot be grouped.

Actually, it's quite easy: instead of the number of bins you can give a list with the bin boundaries. They can be unequally distributed, too:

plt.hist(data, bins=[0, 10, 20, 30, 40, 50, 100])

If you just want them equally distributed, you can simply use range:

plt.hist(data, bins=range(min(data), max(data) + binwidth, binwidth))

Added to original answer

The above line works for data filled with integers only. As macrocosme points out, for floats you can use:

import numpy as np
plt.hist(data, bins=np.arange(min(data), max(data) + binwidth, binwidth))

For N bins, the bin edges are specified by list of N+1 values where the first N give the lower bin edges and the +1 gives the upper edge of the last bin.

Code:

from numpy import np; from pylab import *

bin_size = 0.1; min_edge = 0; max_edge = 2.5
N = (max_edge-min_edge)/bin_size; Nplus1 = N + 1
bin_list = np.linspace(min_edge, max_edge, Nplus1)

Note that linspace produces array from min_edge to max_edge broken into N+1 values or N bins

I use quantiles to do bins uniform and fitted to sample:

bins=df['Generosity'].quantile([0,.05,0.1,0.15,0.20,0.25,0.3,0.35,0.40,0.45,0.5,0.55,0.6,0.65,0.70,0.75,0.80,0.85,0.90,0.95,1]).to_list()

plt.hist(df['Generosity'], bins=bins, normed=True, alpha=0.5, histtype='stepfilled', color='steelblue', edgecolor='none')

enter image description here

I guess the easy way would be to calculate the minimum and maximum of the data you have, then calculate L = max - min. Then you divide L by the desired bin width (I'm assuming this is what you mean by bin size) and use the ceiling of this value as the number of bins.

I like things to happen automatically and for bins to fall on "nice" values. The following seems to work quite well.

import numpy as np
import numpy.random as random
import matplotlib.pyplot as plt
def compute_histogram_bins(data, desired_bin_size):
    min_val = np.min(data)
    max_val = np.max(data)
    min_boundary = -1.0 * (min_val % desired_bin_size - min_val)
    max_boundary = max_val - max_val % desired_bin_size + desired_bin_size
    n_bins = int((max_boundary - min_boundary) / desired_bin_size) + 1
    bins = np.linspace(min_boundary, max_boundary, n_bins)
    return bins

if __name__ == '__main__':
    data = np.random.random_sample(100) * 123.34 - 67.23
    bins = compute_histogram_bins(data, 10.0)
    print(bins)
    plt.hist(data, bins=bins)
    plt.xlabel('Value')
    plt.ylabel('Counts')
    plt.title('Compute Bins Example')
    plt.grid(True)
    plt.show()

The result has bins on nice intervals of bin size.

[-70. -60. -50. -40. -30. -20. -10.   0.  10.  20.  30.  40.  50.  60.]

computed bins histogram

I had the same issue as OP (I think!), but I couldn't get it to work in the way that Lastalda specified. I don't know if I have interpreted the question properly, but I have found another solution (it probably is a really bad way of doing it though).

This was the way that I did it:

plt.hist([1,11,21,31,41], bins=[0,10,20,30,40,50], weights=[10,1,40,33,6]);

Which creates this:

image showing histogram graph created in matplotlib

So the first parameter basically 'initialises' the bin - I'm specifically creating a number that is in between the range I set in the bins parameter.

To demonstrate this, look at the array in the first parameter ([1,11,21,31,41]) and the 'bins' array in the second parameter ([0,10,20,30,40,50]):

The number 1 (from the first array) falls between 0 and 10 (in the 'bins' array)
The number 11 (from the first array) falls between 11 and 20 (in the 'bins' array)
The number 21 (from the first array) falls between 21 and 30 (in the 'bins' array), etc.

Then I'm using the 'weights' parameter to define the size of each bin. This is the array used for the weights parameter: [10,1,40,33,6].

So the 0 to 10 bin is given the value 10, the 11 to 20 bin is given the value of 1, the 21 to 30 bin is given the value of 40, etc.

Related questions
                            
                                ipython notebook clear cell output in code
                            
                                Get last result in interactive Python shell
                            
                                How to form tuple column from two columns in Pandas
                            
                                Find and replace string values in list
                            
                                How do we determine the number of days for a given month in python [duplicate]
                            
                                Django Admin - Disable the 'Add' action for a specific model
                            
                                Using numpy to build an array of all combinations of two arrays
                            
                                Tensorflow 2.0 - AttributeError: module 'tensorflow' has no attribute 'Session'
                            
                                Why does Python pep-8 strongly recommend spaces over tabs for indentation? [closed]
                            
                                Generate a random letter in Python
                            
                                Convert from ASCII string encoded in Hex to plain ASCII?
                            
                                Getting number of elements in an iterator in Python
                            
                                Python mysqldb: Library not loaded: libmysqlclient.18.dylib
                            
                                How can I share Jupyter notebooks with non-programmers? [closed]
                            
                                Matplotlib scatterplot; color as a function of a third variable
                            
                                When splitting an empty string in Python, why does split() return an empty list while split('\n') returns ['']?
                            
                                Django: Why do some model fields clash with each other?
                            
                                How do I select elements of an array given condition?
                            
                                How to expand a list to function arguments in Python [duplicate]
                            
                                what is the most efficient way of counting occurrences in pandas?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Bin size in Matplotlib (Histogram)

Tags:

python

matplotlib

histogram

People also ask

Recent Activity

Donate For Us