Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to plot the difference of two distributions in a seaborn?

I have the following code to compare two distributions:

sns.kdeplot(df['term'][df['outcome'] == 0], shade=1, color='red')
sns.kdeplot(df['term'][df['outcome'] == 1], shade=1, color='green'); 

It looks like this:

enter image description here

How do to plot just the difference of both distributions (disA - disB)? Of course, it could contain negative values.

like image 480
mllamazares Avatar asked Mar 26 '18 09:03

mllamazares


People also ask

How would you make a distribution plot using Seaborn?

How to Plot a Distribution Plot with Seaborn? Seaborn has different types of distribution plots that you might want to use. These plot types are: KDE Plots ( kdeplot() ), and Histogram Plots ( histplot() ). Both of these can be achieved through the generic displot() function, or through their respective functions.

How do you plot multiple variables in Seaborn?

In Seaborn, we will plot multiple graphs in a single window in two ways. First with the help of Facetgrid() function and other by implicit with the help of matplotlib. data: Tidy dataframe where each column is a variable and each row is an observation.

What is Displot in Seaborn?

Overview. We use a displot (also known as a distribution plot) to represent data in histogram form. It is a univariant set of collected data, which means the data distribution of one variable will be shown against another variable. In Python, we use the Seaborn library with Matplotlib for data visualization.


1 Answers

Since the difference between two kde curves is not a kde curve itself, you cannot use kdeplot to plot that difference.

A kde is easily calculated using scipy.stats.gaussian_kde. The result is easily plotted with pyplot.

import numpy as np; np.random.seed(0)
import matplotlib.pyplot as plt
import scipy.stats

a = np.random.gumbel(80, 25, 1000)
b = np.random.gumbel(90, 46, 4000)

kdea = scipy.stats.gaussian_kde(a)
kdeb = scipy.stats.gaussian_kde(b)

grid = np.linspace(0,500, 501)

plt.plot(grid, kdea(grid), label="kde A")
plt.plot(grid, kdeb(grid), label="kde B")
plt.plot(grid, kdea(grid)-kdeb(grid), label="difference")

plt.legend()
plt.show()

enter image description here

Mind that the result is really just the difference between the curves (as being asked for); it has no statistical relevance at all.

like image 65
ImportanceOfBeingErnest Avatar answered Oct 06 '22 06:10

ImportanceOfBeingErnest