Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to plot Probability density Function (PDF) of inter-arrival time of events?

Tags:

python

plot

numpy

I have an array of data values as follows :

0.000000000000000000e+00
3.617000000000000171e+01
1.426779999999999973e+02
2.526699999999999946e+01
4.483190000000000168e+02
7.413999999999999702e+00
1.132390000000000043e+02
8.797000000000000597e+00
1.362599999999999945e+01
2.080880900000000111e+04
5.580000000000000071e+00
3.947999999999999954e+00
2.615000000000000213e+00
2.458000000000000185e+00
8.204600000000000648e+01
1.641999999999999904e+00
5.108999999999999986e+00
2.388999999999999790e+00
2.105999999999999872e+00
5.783000000000000362e+00
4.309999999999999609e+00
3.685999999999999943e+00
6.339999999999999858e+00
2.198999999999999844e+00
3.568999999999999950e+00
2.883999999999999897e+00
7.307999999999999829e+00
2.515000000000000124e+00
3.810000000000000053e+00
2.829000000000000181e+00
2.593999999999999861e+00
3.963999999999999968e+00
7.258000000000000007e+00
3.543000000000000149e+00
2.874000000000000110e+00
................... and so on. 

I want to plot Probability Density function of the data values. I referred (Wiki) and scipy.stats.gaussian_kde. but i am not getting that is correct or not. i am using python. simple data plot code is as follows :

from matplotlib import pyplot as plt
plt.plot(Data)

But now i want to plot PDF (Probability Density Function). But i am not getting any library in python to do so.

like image 750
KrunalParmar Avatar asked Dec 04 '22 00:12

KrunalParmar


2 Answers

The dataset you provide is very small to allow for a reliable kernel-density estimation. Therefore, I will demostrate the procedure (if I understood correctly what you are trying to do) by using another data set

import numpy as np
import scipy.stats

# generate data samples
data = scipy.stats.expon.rvs(loc=0, scale=1, size=1000, random_state=123)

A kernel density estimation can then be obtained by simply calling

scipy.stats.gaussian_kde(data,bw_method=bw)

where bw is an (optional) parameter for the estimation procedure. For this data set, and considering three values for bw the fit is as shown below

# test values for the bw_method option ('None' is the default value)
bw_values =  [None, 0.1, 0.01]

# generate a list of kde estimators for each bw
kde = [scipy.stats.gaussian_kde(data,bw_method=bw) for bw in bw_values]


# plot (normalized) histogram of the data
import matplotlib.pyplot as plt 
plt.hist(data, 50, normed=1, facecolor='green', alpha=0.5);

# plot density estimates
t_range = np.linspace(-2,8,200)
for i, bw in enumerate(bw_values):
    plt.plot(t_range,kde[i](t_range),lw=2, label='bw = '+str(bw))
plt.xlim(-1,6)
plt.legend(loc='best')

enter image description here

Note that large bw values result in a smoother pdf estimate, however, with the cost (in this example) of suggesting negative values are possible, which is not the case here.

like image 115
Stelios Avatar answered Dec 06 '22 14:12

Stelios


Use numpy.histogram

Example:

# a is your data array
hist, bins = np.histogram(a, bins=100, normed=True)
bin_centers = (bins[1:]+bins[:-1])*0.5
plt.plot(bin_centers, hist)
like image 42
Han-Kwang Nienhuys Avatar answered Dec 06 '22 15:12

Han-Kwang Nienhuys