Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add KDE on to a histogram

I would like to add a density plot to my histogram diagram. I know something about pdf function but I've got confused and other similar questions were not helpful.

from scipy.stats import * 
from numpy import*
from matplotlib.pyplot import*
from random import*

nums = []
N = 100
for i in range(N):
    a = randint(0,9)
    nums.append(a)

bars= [0,1,2,3,4,5,6,7,8,9]
alpha, loc, beta=5, 100, 22

hist(nums,normed= True,bins = bars)


show()

I'm looking for something like this

enter image description here

like image 627
aaa Avatar asked Oct 24 '15 21:10

aaa


People also ask

What is KDE in histogram?

A kernel density estimate (KDE) plot is a method for visualizing the distribution of observations in a dataset, analogous to a histogram. KDE represents the data using a continuous probability density curve in one or more dimensions.

How do you plot a KDE plot in Python?

Kdeplot is a Kernel Distribution Estimation Plot which depicts the probability density function of the continuous or non-parametric data variables i.e. we can plot for the univariate or multiple variables altogether. Using the Python Seaborn module, we can build the Kdeplot with various functionality added to it.

How do you plot KDE in Seaborn?

Kernel density estimation is a non-parametric way to estimate the distribution of a variable. In seaborn, we can plot a kde using jointplot(). Pass value 'kde' to the parameter kind to plot kernel plot.

What is the difference between a histogram and a kernel density estimate?

A histogram puts all samples between the boundaries of each bin will fall into the bin. It doesn't differentiate whether the value falls close the left, to the right or the center of the bin. A kde plot, on the other hand, takes each individual sample value and draws a small gaussian bell curve over it.


2 Answers

from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(41)

N = 100
x = np.random.randint(0, 9, N)
bins = np.arange(10)

kde = stats.gaussian_kde(x)
xx = np.linspace(0, 9, 1000)
fig, ax = plt.subplots(figsize=(8,6))
ax.hist(x, density=True, bins=bins, alpha=0.3)
ax.plot(xx, kde(xx))

plot

like image 199
cel Avatar answered Oct 16 '22 17:10

cel


Here's a solution using seaborn 0.11.1 and pandas 1.1.5:

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np

N = 100
nums = [np.random.randint(i-i, 9) for i in range(N)]
df = pd.DataFrame(nums, columns=["value"])

fig, ax1 = plt.subplots()
sns.kdeplot(data=df, x="value", ax=ax1)
ax1.set_xlim((df["value"].min(), df["value"].max()))
ax2 = ax1.twinx()
sns.histplot(data=df, x="value", discrete=True, ax=ax2)

enter image description here

Note how I use numpy to generate the random values because I need actual values, not generators. The discrete=True in the last line assures that the ticks are centered.

like image 4
MERose Avatar answered Oct 16 '22 17:10

MERose