Likelihood ratio test in Python

Tags:

I am having trouble computing a likelihood ratio test in Python 2.7.

I have two models and the corresponding likelihood values. I believe the rule for comparing whether model L2 is better than model L1 (if the models are closely related) is to look at -2 * log(L2/L1).

I then want to find the p-value for corresponding to -2 * log(L2/L1) and relate this to the significance for L2 is preferred to L1. Here is what I have so far:

import numpy as np
from scipy.stats import chisqprob

L1 = 467400. # log(likelihood) of my 1st fit
L2 = 467414. # log(likelihood) of my 2nd fit

LR = -2. * np.log(L2 / L1) # LR = -5.9905e-05

p = chisqprob(LR, 1) # L2 has 1 DoF more than L1

print 'p: %.30f' % p # p = 1.000000000000000000000000000000

five_sigma = 1 - scipy.special.erf(5 / np.sqrt(2.))                  :-)
print '5 sigma: %.30f' % five_sigma

five_sigma_check = 1 - 0.999999426696856                             :-(
print 'Check  : %.30f' % five_sigma_check

However, I run into two issues:

My p-value is coming out to be 1 when I'd have expected it to be close to 0.
When I use the formula on the line marked with the :-) to find five sigma, for example, it differs from the value quoted in the literature - that line is highlighted with a :-(. My value for five_sigma_check is taken from here.

Can anyone offer any advice, please? I'm relatively new to the world of Python and statistics.

Thanks.

595

asked Jul 07 '16 14:07

Sean Mooney

2 Answers

To calculate the likelihood ratio given the log-likelihoods, use this formula:

from scipy.stats.distributions import chi2
def likelihood_ratio(llmin, llmax):
    return(2*(llmax-llmin))


LR = likelihood_ratio(L1,L2)


p = chi2.sf(LR, 1) # L2 has 1 DoF more than L1

print 'p: %.30f' % p 

# p: 0.000000121315450836607258011741

163

answered Oct 11 '22 05:10

C_Z_

Adding a little bit theory, so that it will be helpful for someone to understand:

enter image description here

Here we have two models (assuming nested) and we want to compare the effectiveness of the models in terms of explaining the data. From the above figure, let's now implement the LR test with python 3:

from scipy.stats import chi2 
ll_0, ll_1 = 467400, 467414 # given, the log-likelihoods of the nested models m_0, m_1
# log likelihood for m_0 (H_0) must be <= log likelihood of m_1 (H_1)
Λ = -2 * (ll_0 - ll_1)
print(Λ)
# 28.0
df = 1 # given the difference in dof
# compute the p-value
pvalue = 1 - chi2(df).cdf(Λ) # since Λ follows χ2
print(pvalue)
# 1.2131545083660726e-07

We can plot and clearly see that we can reject the NULL hypothesis in favor of model m1, at α=0.05.

α, df = 0.05, 1
x = np.linspace(0, 30, 1000)
plt.plot(x, chi2(df).pdf(x), label='χ2')
plt.axvline(chi2(df).ppf(1-α), color='red', label='α=0.05')
plt.scatter(Λ, 0, color='green', s=50, label='Λ')
plt.legend()
plt.title('χ2({}) distribution'.format(df), size=20)

enter image description here

answered Oct 11 '22 05:10

Sandipan Dey

Related questions
                            
                                Resample a time series with the index of another time series
                            
                                Plot a 3D surface from {x,y,z}-scatter data in python
                            
                                SqlAlchemy select with max, group_by and order_by
                            
                                override class variable in python?
                            
                                Is a generator the callable? Which is the generator?
                            
                                Concatenate custom features with CountVectorizer
                            
                                Pandas Groupby apply function to count values greater than zero
                            
                                What is the form of my local postgresql database url?
                            
                                Python, how to implement something like .gitignore behavior
                            
                                Provide extra information to Flask's app.logger
                            
                                How to replace all value in all columns in a Pandas dataframe with condition
                            
                                Debounce Celery tasks?
                            
                                Pass a custom queryset to serializer in Django Rest Framework
                            
                                How to specify large integer literals in a readable way?
                            
                                Classifying Python array by nearest "seed" region?
                            
                                TensorFlow MNIST example not running with fully_connected_feed.py
                            
                                Pandas error "Can only use .str accessor with string values"
                            
                                How to get numpy array of RGB colors from pygame.surface
                            
                                How to monitor convergence of Gensim LDA model?
                            
                                What is the Python equivalent of Java's UnsupportedOperationException?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Likelihood ratio test in Python

Tags:

python

python-2.7

statistics

scipy

Sean Mooney

People also ask

2 Answers

C_Z_

Sandipan Dey

Recent Activity

Donate For Us