What is y axis in seaborn distplot?

Tags:

I have some geometrically distributed data. When I want to take a look at it, I use

sns.distplot(data, kde=False, norm_hist=True, bins=100)

which results is a picture:

Plot 1a

However, bins heights don't add up to 1, which means y axis doesn't show probability, it's something different. If instead we use

weights = np.ones_like(np.array(data))/float(len(np.array(data)))
plt.hist(data, weights=weights, bins = 100)

the y axis shall show probability, as bins heights sum up to 1:

Plot 1b

It can be seen more clearly here: suppose we have a list

l = [1, 3, 2, 1, 3]

We have two 1s, two 3s and one 2, so their respective probabilities are 2/5, 2/5 and 1/5. When we use seaborn histplot with 3 bins:

sns.distplot(l, kde=False, norm_hist=True, bins=3)

we get:

Plot 2a

As you can see, the 1st and the 3rd bin sum up to 0.6+0.6=1.2 which is already greater than 1, so y axis is not a probability. When we use

weights = np.ones_like(np.array(l))/float(len(np.array(l)))
plt.hist(l, weights=weights, bins = 3)

we get:

enter image description here

and the y axis is probability, as 0.4+0.4+0.2=1 as expected.

The amount of bins in these 2 cases are is the same for both methods used in each case: 100 bins for geometrically distributed data, 3 bins for small array l with 3 possible values. So bins amount is not the issue.

My question is: in seaborn distplot called with norm_hist=True, what is the meaning of y axis?

587

asked Aug 03 '18 06:08

Mister Twister

2 Answers

From the documentation:

norm_hist : bool, optional

If True, the histogram height shows a density rather than a count. This is implied if a KDE or fitted density is plotted.

So you need to take into account your bin width as well, i.e. compute the area under the curve and not just the sum of the bin heights.

139

answered Oct 12 '22 01:10

IonicSolutions

The x-axis is the value of the variable just like in a histogram, but what exactly does the y-axis represent?

ANS-> The y-axis in a density plot is the probability density function for the kernel density estimation. However, we need to be careful to specify this is a probability density and not a probability. The difference is the probability density is the probability per unit on the x-axis. To convert to an actual probability, we need to find the area under the curve for a specific interval on the x-axis. Somewhat confusingly, because this is a probability density and not a probability, the y-axis can take values greater than one. The only requirement of the density plot is that the total area under the curve integrates to one. I generally tend to think of the y-axis on a density plot as a value only for relative comparisons between different categories.

from the reference of https://towardsdatascience.com/histograms-and-density-plots-in-python-f6bda88f5ac0

answered Oct 12 '22 01:10

Prasann

Related questions
                            
                                Using Windows Python from Cygwin
                            
                                Can't install via pip with Virtualenv
                            
                                Converting Float to Dollars and Cents
                            
                                How do I make a python script executable?
                            
                                Flask url_for generating http URL instead of https
                            
                                Accurate timing of functions in python
                            
                                How to make PyPi description Markdown work?
                            
                                Create a new RGB OpenCV image using Python? [duplicate]
                            
                                Examples for string find in Python
                            
                                How to slice a list from an element n to the end in python?
                            
                                collect_list by preserving order based on another variable
                            
                                Calculate area of polygon given (x,y) coordinates
                            
                                How to get integer values from a string in Python?
                            
                                Python: Assign Value if None Exists
                            
                                How to switch between python 2.7 to python 3 from command line?
                            
                                Invalid control character with Python json.loads
                            
                                python ignore certificate validation urllib2
                            
                                writing to existing workbook using xlwt [closed]
                            
                                Use Django ORM as standalone [duplicate]
                            
                                ImportError: No module named Image [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is y axis in seaborn distplot?

Tags:

python

matplotlib

seaborn

Mister Twister

People also ask

2 Answers

IonicSolutions

Prasann

Recent Activity

Donate For Us