Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate p-value for two lists of floats?

So I have lists of floats. Like [1.33,2.555,3.2134,4.123123] etc. Those lists are mean frequencies of something. How do I proof that two lists are different? I thought about calculating p-value. Is there a function to do that? I looked through scipy documentation, but couldn't figure out what to use.

Can anyone please advice?

like image 420
YKY Avatar asked Apr 10 '15 12:04

YKY


People also ask

How do you find the p-value of two variables?

For an upper-tailed test, the p-value is equal to one minus this probability; p-value = 1 - cdf(ts). For a two-sided test, the p-value is equal to two times the p-value for the lower-tailed p-value if the value of the test statistic from your sample is negative.

How do you find the p-value in machine learning?

Take a sample, get the statistic (mean), and work out how likely it is to get such figures if H0 is true. The parameter which tells us the former is the p-value. If the p-value is quite small, H0 is probably wrong. The lower the p-value, the more evidence we have that the null hypothesis is false.


1 Answers

Let's say you have a list of floats like this:

>>> data = {
...     'a': [0.9, 1.0, 1.1, 1.2],
...     'b': [0.8, 0.9, 1.0, 1.1],
...     'c': [4.9, 5.0, 5.1, 5.2],
... }

Clearly, a is very similar to b, but both are different from c.

There are two kinds of comparisons you may want to do.

  1. Pairwise: Is a similar to b? Is a similar to c? Is b similar to c?
  2. Combined: Are a, b and c drawn from the same group? (This is generally a better question)

The former can be achieved using independent t-tests as follows:

>>> from itertools import combinations
>>> from scipy.stats import ttest_ind
>>> for list1, list2 in combinations(data.keys(), 2):
...     t, p = ttest_ind(data[list1], data[list2])
...     print list1, list2, p
...
a c 9.45895002589e-09
a b 0.315333596201
c b 8.15963804843e-09

This provides the relevant p-values, and implies that that a and c are different, b and c are different, but a and b may be similar.

The latter can be achieved using the one-way ANOVA as follows:

>>> from scipy.stats import f_oneway
>>> t, p =  f_oneway(*data.values())
>>> p
7.959305946160327e-12

The p-value indicates that a, b, and c are unlikely to be from the same population.

like image 83
S Anand Avatar answered Sep 21 '22 20:09

S Anand