Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are equivalents to R's "phyper" function in Python?

In R, I use the phyper function to do a hypergeometric test for bioinformatics analysis. However I use a lot of Python code and using rpy2 here is quite slow. So, I started looking for alternatives. It seemed that scipy.stats.hypergeom had something similar.

Currently, I call phyper like this:

pvalue <- 1-phyper(45, 92, 7518, 1329)

where 45 is the number of selected items having the property of interest, 92 the number of total items having the property, 7518 the number of non selected items not having the property, and 1329 the total number of selected items.

In R, this yields 6.92113e-13.

Attempting to do the same with scipy.stats.hypergeom however yields a completely different result (notice, the numbers are swapped because the function accepts numbers in a different way):

import scipy.stats as stats   
pvalue = 1-stats.hypergeom.cdf(45, 7518, 92. 1329)
print pvalue

However this returns -7.3450134863151106e-12, which makes little sense. Notice that I've tested this on other data and I had little issues (same precision up to the 4th decimal, which is enough for me).

So it boils down to these possibilities:

  1. I'm using the wrong function for the job (or wrong parameters)
  2. There's a bug in scipy

In case of "1", are there other alternatives to phyper that can be used in Python?

EDIT: As the comments have noted, this is a bug in scipy, fixed in git master.

like image 665
Einar Avatar asked Jul 06 '11 10:07

Einar


1 Answers

From the docs, you could try:

hypergeom.sf(x,M,n,N,loc=0) : survival function (1-cdf — sometimes more accurate)

Also, I think you might have the values mixed up.

Models drawing objects from a bin. M is total number of objects, n is total number of Type I objects. RV counts number of Type I objects in N drawn without replacement from population.

Therefore, by my reading: x=q, M=n+m, n=m, N=k.

So I would try:

stats.hypergeom.sf(45,(92+7518),92,1329)
like image 179
James Avatar answered Oct 27 '22 13:10

James