Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fitting Poisson distribution to data in python

I have data distribution that I want to fit Poisson distribution to it. my data looks like that:

data

I try to fit :

 mu = herd_size["COW_NUM"].mean() 
ax=sns.displot(data=herd_size["COW_NUM"], kde=True)
ax.set(xlabel='Size',title='Herd size distribution & poisson distribution')
plt.plot(np.arange(0, 2000, 80), [st.poisson.pmf(np.arange(i, i+80), mu).sum()*len(herd_size["COW_NUM"])
                                  for i in np.arange(0, 2000, 80)], color='red')
#every bin contain approximatly 80 observes
plt.show()

but I get something not at the same scale:

data with Poisson fit

UPDATE I try to apply negative binom distribution with the code:

n=len(herd_size["COW_NUM"])
p =herd_size["COW_NUM"].mean()/(herd_size["COW_NUM"].mean()+2) 
ax=sns.displot(data=herd_size["COW_NUM"], kde=True)
ax.set(xlabel='Size',title='Herd size distribution & geometry distribution')
plt.plot(np.arange(0, 2000, 80), [st.nbinom.pmf(np.arange(i, i+80), n,p).sum()*len(herd_size["COW_NUM"])
                                  for i in np.arange(0, 2000, 80)], color='red')
#every bin contain approximatly 80 observes
plt.show()

but I got this: nbinom

like image 321
myh Avatar asked Oct 22 '25 13:10

myh


1 Answers

For what you need to plot, might be easier to provide the bins to make your histogram:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import poisson

herd_size = pd.DataFrame({'COW_NUM':np.random.poisson(200,2000)})
binwidth = 10
xstart = 150
xend = 280
bins = np.arange(xstart,xend,binwidth)

o = sns.histplot(data=herd_size["COW_NUM"], kde=True,bins = bins)

Then calculate your mean and total number:

mu = herd_size["COW_NUM"].mean() 
n = len(herd_size)

The expected frequency is the difference of the start and end of cdf on your left and right intervals:

plt.plot(bins + binwidth/2 , n*(poisson.cdf(bins+binwidth,mu) - poisson.cdf(bins,mu)), color='red')

enter image description here

Your data is overdispersed, because for a poisson you don't expect data to be so spread. so what you need to do is to use a gamma or a negative binomial to fit it, for example:

from scipy.stats import nbinom
herd_size = pd.DataFrame({'COW_NUM':nbinom.rvs(n=2,p=0.1,loc=240,size=2000)})
binwidth = 50
xstart = 0
xend = 2000
bins = np.arange(xstart,xend,binwidth)

herd_size = pd.DataFrame({'COW_NUM':nbinom.rvs(n=1,p=0.004,size=2000)})

Var = herd_size["COW_NUM"].var()
mu = herd_size["COW_NUM"].mean()
p =  (mu/Var)
r = mu**2 / (Var-mu)
n = len(herd_size)

o = sns.histplot(data=herd_size["COW_NUM"], kde=True,bins=bins)

plt.plot(bins + binwidth/2 , 
         n*(nbinom.cdf(bins+binwidth,r,p) - nbinom.cdf(bins,r,p)), 
         color='red')

enter image description here

like image 129
StupidWolf Avatar answered Oct 25 '25 04:10

StupidWolf