Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do a one tail pvalue calculate in Python?

I have two arrays, one is an array of corrected values, x, and the other is an array of the original values(before a correction was applied), y. I know that if I want to do a two-tailed ttest to get the two-tailed pvalue I need to do this:

t_statistic, pvlaue = scipy.stats.ttest_ind(x, y, nan_policy='omit')

However this only tells me if the two arrays are significantly different from eachother. I want to show that the corrected values, x, are significantly less than y. To do this it seems like I need to get the one-tailed pvalue but I can't seem to find a function that does this. Any ideas?

like image 993
HM14 Avatar asked Apr 25 '26 12:04

HM14


1 Answers

Consider these two arrays:

import scipy.stats as ss
import numpy as np
prng = np.random.RandomState(0)
x, y = prng.normal([1, 2], 1, size=(10, 2)).T

An independent sample t-test returns:

t_stat, p_val = ss.ttest_ind(x, y, nan_policy='omit')
print('t stat: {:.4f}, p value: {:4f}'.format(t_stat, p_val))

# t stat: -1.1052, p value: 0.283617

This p-value is actually calculated from the cumulative density function:

ss.t.cdf(-abs(t_stat), len(x) + len(y) - 2) * 2
# 0.28361693716176473

Here, len(x) + len(y) - 2 is the number of degrees of freedom.

Notice the multiplication with 2. If the test is one-tailed, you don't multiply. That's all. So your p-value for a left tailed test is

ss.t.cdf(t_stat, len(x) + len(y) - 2)
# 0.14180846858088236

If the test was right tailed, you would use the survival function

ss.t.sf(t_stat, len(x) + len(y) - 2)
# 0.85819153141911764

which is the same as 1 - ss.t.cdf(...).

I assumed that the arrays have the same length. If not, you need to modify the degrees of freedom.