I have two arrays, one is an array of corrected values, x, and the other is an array of the original values(before a correction was applied), y. I know that if I want to do a two-tailed ttest to get the two-tailed pvalue I need to do this:
t_statistic, pvlaue = scipy.stats.ttest_ind(x, y, nan_policy='omit')
However this only tells me if the two arrays are significantly different from eachother. I want to show that the corrected values, x, are significantly less than y. To do this it seems like I need to get the one-tailed pvalue but I can't seem to find a function that does this. Any ideas?
Consider these two arrays:
import scipy.stats as ss
import numpy as np
prng = np.random.RandomState(0)
x, y = prng.normal([1, 2], 1, size=(10, 2)).T
An independent sample t-test returns:
t_stat, p_val = ss.ttest_ind(x, y, nan_policy='omit')
print('t stat: {:.4f}, p value: {:4f}'.format(t_stat, p_val))
# t stat: -1.1052, p value: 0.283617
This p-value is actually calculated from the cumulative density function:
ss.t.cdf(-abs(t_stat), len(x) + len(y) - 2) * 2
# 0.28361693716176473
Here, len(x) + len(y) - 2 is the number of degrees of freedom.
Notice the multiplication with 2. If the test is one-tailed, you don't multiply. That's all. So your p-value for a left tailed test is
ss.t.cdf(t_stat, len(x) + len(y) - 2)
# 0.14180846858088236
If the test was right tailed, you would use the survival function
ss.t.sf(t_stat, len(x) + len(y) - 2)
# 0.85819153141911764
which is the same as 1 - ss.t.cdf(...).
I assumed that the arrays have the same length. If not, you need to modify the degrees of freedom.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With