I am having trouble computing a likelihood ratio test in Python 2.7.
I have two models and the corresponding likelihood values. I believe the rule for comparing whether model L2 is better than model L1 (if the models are closely related) is to look at -2 * log(L2/L1).
I then want to find the p-value for corresponding to -2 * log(L2/L1) and relate this to the significance for L2 is preferred to L1. Here is what I have so far:
import numpy as np
from scipy.stats import chisqprob
L1 = 467400. # log(likelihood) of my 1st fit
L2 = 467414. # log(likelihood) of my 2nd fit
LR = -2. * np.log(L2 / L1) # LR = -5.9905e-05
p = chisqprob(LR, 1) # L2 has 1 DoF more than L1
print 'p: %.30f' % p # p = 1.000000000000000000000000000000
five_sigma = 1 - scipy.special.erf(5 / np.sqrt(2.)) :-)
print '5 sigma: %.30f' % five_sigma
five_sigma_check = 1 - 0.999999426696856 :-(
print 'Check : %.30f' % five_sigma_check
However, I run into two issues:
five_sigma_check
is taken from here.Can anyone offer any advice, please? I'm relatively new to the world of Python and statistics.
Thanks.
The Likelihood Ratio Test Procedure Assume k parameters were lost (i.e., L_0 has k less parameters than L_1). Form the ratio \lambda = L_0 / L_1. This ratio is always between 0 and 1 and the less likely the assumption is, the smaller \lambda will be.
The t-test for a mean μ is the likelihood ratio test!
The Likelihood-Ratio test (sometimes called the likelihood-ratio chi-squared test) is a hypothesis test that helps you choose the “best” model between two nested models.
To calculate the likelihood ratio given the log-likelihoods, use this formula:
from scipy.stats.distributions import chi2
def likelihood_ratio(llmin, llmax):
return(2*(llmax-llmin))
LR = likelihood_ratio(L1,L2)
p = chi2.sf(LR, 1) # L2 has 1 DoF more than L1
print 'p: %.30f' % p
# p: 0.000000121315450836607258011741
Adding a little bit theory, so that it will be helpful for someone to understand:
Here we have two models (assuming nested) and we want to compare the effectiveness of the models in terms of explaining the data. From the above figure, let's now implement the LR test with python 3:
from scipy.stats import chi2
ll_0, ll_1 = 467400, 467414 # given, the log-likelihoods of the nested models m_0, m_1
# log likelihood for m_0 (H_0) must be <= log likelihood of m_1 (H_1)
Λ = -2 * (ll_0 - ll_1)
print(Λ)
# 28.0
df = 1 # given the difference in dof
# compute the p-value
pvalue = 1 - chi2(df).cdf(Λ) # since Λ follows χ2
print(pvalue)
# 1.2131545083660726e-07
We can plot and clearly see that we can reject the NULL hypothesis in favor of model m1, at α=0.05.
α, df = 0.05, 1
x = np.linspace(0, 30, 1000)
plt.plot(x, chi2(df).pdf(x), label='χ2')
plt.axvline(chi2(df).ppf(1-α), color='red', label='α=0.05')
plt.scatter(Λ, 0, color='green', s=50, label='Λ')
plt.legend()
plt.title('χ2({}) distribution'.format(df), size=20)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With