Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to plot a mean line on a distplot between 0 and the y value of the mean?

I have a distplot and I would like to plot a mean line that goes from 0 to the y value of the mean frequency. I want to do this, but have the line stop at when the distplot does. Why isn't there a simple parameter that does this? It would be very useful.

I have some code that gets me almost there:

plt.plot([x.mean(),x.mean()], [0, *what here?*])

This code plots a line just as I'd like except for my desired y-value. What would the correct math be to get the y max to stop at the frequency of the mean in the distplot? An example of one of my distplots is below using 0.6 as the y-max. It would be awesome if there was some math to make it stop at the y-value of the mean. I have tried dividing the mean by the count etc.

enter image description here

like image 868
bismo Avatar asked Aug 07 '20 18:08

bismo


People also ask

How do you show mean on a Distplot?

Add Mean line to Histogram with axvline() We will use Matplotlib's axvline() function to add mean line to the histogram made with Seaborn's displot(). We also specify color argument to make the mean line in red color.

What is the y axis in Distplot?

ANS-> The y-axis in a density plot is the probability density function for the kernel density estimation.

How do you plot a Distplot in Python?

Python Seaborn module contains various functions to plot the data and depict the data variations. The seaborn. distplot() function is used to plot the distplot. The distplot represents the univariate distribution of data i.e. data distribution of a variable against the density distribution.

What is a mean plot in statistics?

Mean Plot. In most cases, the data set contains a specific grouping variable. For example, the groups may be the levels of a factor variable. In the sample plot below, the months of the year provide the grouping. Mean plots can be used with ungrouped data to determine if the mean is changing over time.

What is a distplot in Matplotlib?

A distplot plots a univariate distribution of observations. The distplot () function combines the matplotlib hist function with the seaborn kdeplot () and rugplot () functions.

What is the difference between a histogram and a distplot?

It plots datapoints in an array as sticks on an axis.Just like a distplot it takes a single column. Instead of drawing a histogram it creates dashes all across the plot.

Is it possible to generate a mean plot from a macro?

Most general purpose statistical software programs do not support a mean plot. However, if the statistical program can generate the mean over a group, it should be feasible to write a macro to generate this plot.


Video Answer


2 Answers

Update for the latest versions of matplotlib (3.3.4) and seaborn (0.11.1): the kdeplot with shade=True now doesn't create a line object anymore. To get the same outcome as before, setting shade=False will still create the line object. The curve can then be filled with ax.fill_between(). The code below is changed accordingly. (Use the revision history to see the older version.)

ax.lines[0] gets the curve of the kde, of which you can extract the x and y data. np.interp then can find the height of the curve for a given x-value:

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

x = np.random.normal(np.tile(np.random.uniform(10, 30, 5), 50), 3)
ax = sns.kdeplot(x, shade=False, color='crimson')
kdeline = ax.lines[0]
mean = x.mean()
xs = kdeline.get_xdata()
ys = kdeline.get_ydata()
height = np.interp(mean, xs, ys)
ax.vlines(mean, 0, height, color='crimson', ls=':')
ax.fill_between(xs, 0, ys, facecolor='crimson', alpha=0.2)
plt.show()

example plot

The same approach can be extended to show the mean together with the standard deviation, or the median and the quartiles:

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

x = np.random.normal(np.tile(np.random.uniform(10, 30, 5), 50), 3)
fig, axes = plt.subplots(ncols=2, figsize=(12, 4))
for ax in axes:
    sns.kdeplot(x, shade=False, color='crimson', ax=ax)
    kdeline = ax.lines[0]
    xs = kdeline.get_xdata()
    ys = kdeline.get_ydata()
    if ax == axes[0]:
        middle = x.mean()
        sdev = x.std()
        left = middle - sdev
        right = middle + sdev
        ax.set_title('Showing mean and sdev')
    else:
        left, middle, right = np.percentile(x, [25, 50, 75])
        ax.set_title('Showing median and quartiles')
    ax.vlines(middle, 0, np.interp(middle, xs, ys), color='crimson', ls=':')
    ax.fill_between(xs, 0, ys, facecolor='crimson', alpha=0.2)
    ax.fill_between(xs, 0, ys, where=(left <= xs) & (xs <= right), interpolate=True, facecolor='crimson', alpha=0.2)
    # ax.set_ylim(ymin=0)
plt.show()

sdev, median, quartiles

PS: for the mode of the kde:

    mode_idx = np.argmax(ys)
    ax.vlines(xs[mode_idx], 0, ys[mode_idx], color='lime', ls='--')
like image 97
JohanC Avatar answered Sep 28 '22 20:09

JohanC


With plt.get_ylim() you can get the limits of the current plot: [bottom, top].
So, in your case, you can extract the actual limits and save them in ylim, then draw the line:

fig, ax = plt.subplots()

ylim = ax.get_ylim()
ax.plot([x.mean(),x.mean()], ax.get_ylim())
ax.set_ylim(ylim)

As ax.plot changes the ylims afterwards, you have to re-set them with ax.set_ylim as above.

like image 25
Zephyr Avatar answered Sep 28 '22 18:09

Zephyr