Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to vectorize Fisher's exact test?

Is it possible, and if so how, to optimize this calculation using the vectorization of Fisher's exact test? Runtime is cumbersome when num_cases > ~1000000.

import numpy as np
from scipy.stats import fisher_exact

num_cases = 100
randCounts = np.random.random_integers(100,size=(num_cases,4))

def testFisher(randCounts):
    return [fisher_exact([[r[0],r[1]],[r[2], r[3]]])[0] for r in randCounts]

In [6]: %timeit testFisher(randCounts)
        1 loops, best of 3: 524 ms per loop
like image 777
Kevin Avatar asked Oct 18 '22 17:10

Kevin


1 Answers

Here is an answer using fisher exact as implemented in fisher. I compute the OR by hand in numpy.

Install:

# pip install fisher
# or 
# conda install -c bioconda fisher

Setup:

import numpy as np
np.random.seed(0)
num_cases = 100
c = np.random.randint(100,size=(num_cases,4), dtype=np.uint)

# head, i.e. 
c[:5]
# array([[44, 47, 64, 67],
#   [67,  9, 83, 21],
#   [36, 87, 70, 88],
#   [88, 12, 58, 65],
#   [39, 87, 46, 88]], dtype=uint64)

Execute:

from fisher import pvalue_npy
_, _, twosided = pvalue_npy(c[:, 0], c[:, 1], c[:, 2], c[:, 3])
odds = (c[:, 0] * c[:, 3]) / (c[:, 1] * c[:, 2])

print("result fast p and odds", odds[0], twosided[0])
# result fast p and odds 0.9800531914893617 1.0
print("result slow", fisher_exact([[c[0][0], c[0][1]], [c[0][2], c[0][3]]]))
# result slow (0.9800531914893617, 1.0)

Note that for one million rows it only takes two seconds :)

Also, to compute an approximate OR you might want to add a pseudocount to the table before finding the oddsratio. This is often more interesting than inf, since you can compare the approximations :) :

c2 = c + 1
odds = (c2[:, 0] * c2[:, 3]) / (c2[:, 1] * c2[:, 2])

Edit:

from 0.0.61>= this method is included in pyranges as pr.stats.fisher_exact.

like image 111
The Unfun Cat Avatar answered Nov 01 '22 16:11

The Unfun Cat