Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get bin width used for seaborn plot

Tags:

python

seaborn

How do I find out what bin width was used when doing a distplot in Seaborn? I have two datasets I would like to share bin widhts, but don't know how to return the default value used for the first dataset. for something like the simple example below, how would I find out the bin width used?

import nump as np
import seaborn as sns
f, axs = plt.subplots(1,1)
distribution=np.random.rand(1000)
sns.distplot(distribution, hist=True , kde_kws={"shade": True},ax=axs)
like image 657
Novice Avatar asked Jan 26 '23 23:01

Novice


1 Answers

Seaborn uses Freedman-Diaconis rule to calculate bin width if bins parameter is not specified in the function seaborn.distplot()

The equation is as follows (from wikipedia):

Freedman-Diaconis rule

We can calculate IQR and the cube-root of n with the following code.

Q1 = np.quantile(distribution, 0.25)
Q3 = np.quantile(distribution, 0.75)
IQR = Q3 - Q1

cube = np.cbrt(len(distribution)

The bin width is:

In[] : 2*IQR/cube 
Out[]: 0.10163947994817446

Finally, we can now calculate the number of bins.

In[] : 1/(2*IQR/cube) # '1' is the range of the array for this example
Out[]: 9.838696543015526

When we round up the result, it amounts to 10. That's our number of bins. We can now specify bins parameter to get the same number of bins (or same bin width for the same range)

Graph w/o specifying bins:

f, axs = plt.subplots(1,1)
distribution=np.random.rand(1000)
sns.distplot(distribution, hist=True , kde_kws={"shade": True},ax=axs)

Bin width for seaborn distplot

Graph w/ specifying the parameter bins=10:

f, axs = plt.subplots(1,1)
sns.distplot(distribution, bins=10, hist=True , kde_kws={"shade": True},ax=axs)

Bin width

Update:

Seaborn version 0.9 was mentioning Freedman-Diaconis rule as a way to calculate bin size:

Specification of hist bins, or None to use Freedman-Diaconis rule.

The description changed in version 0.10 as follows:

Specification of hist bins. If unspecified, as reference rule is used that tries to find a useful default.

like image 164
Kaymal Avatar answered Jan 28 '23 14:01

Kaymal