Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy correlation coefficient: np.dot(A, A.T) on large arrays causing seg fault

NOTE:

Speed is not as important as getting a final result.
However, some speed up over worst case is required as well. 

I have a large array A:

A.shape=(20000,265) # or possibly larger like 50,000 x 265

I need to compute the correlation coefficients.

np.corrcoeff # internally casts the results as doubles

I just borrowed their code and wrote my own cov/corr not casting into doubles, since I really only need 32 bit floats.And I ditch the conj() since my data are always real.

cov = A.dot(A.T)/n  #where A is an array of 32 bit floats
diag = np.diag(cov)
corr = cov / np.sqrt(np.mutliply.outer(d,d))

I still run out of memory and I'm using a large memory machine, 264GB

I've been told, that the fast C libraries, are probably using a routine which breaks the dot product up into pieces, and to optimize this, the number of elements is padded to a power of 2.

I don't really need to compute the symmetric half of the correlation coefficient matrix. However, I don't see a way to do this in reasonable amount of time doing it "manually", with python loops.

Does anybody know of a way to ask numpy for a decent dot product routine, that balances memory usage with speed...?

Cheers

UPDATE:

Funny how writing these questions tends to help me find the language for a better google query.

Found this:

http://wiki.scipy.org/PerformanceTips

Not sure that I follow it....so, please comment or provide answers about this solution, your own ideas, or just general commentary on this type of problem.

TIA

EDIT: I apologize because my array is really much bigger than I thought. array size is actually 151,000 x 265 I''m running out of memory on a machine with 264 GB with at least 230 GB free.

I'm surprised that the numpy call to blas dgemm and being careful with C order arrays didn't do squat.

like image 665
wbg Avatar asked Mar 04 '14 19:03

wbg


2 Answers

Python compiled with intel's mkl will run this with 12GB of memory in about 30 seconds:

>>> A = np.random.rand(50000,265).astype(np.float32)
>>> A.dot(A.T)
array([[ 86.54410553,  64.25226593,  67.24698639, ...,  68.5118103 ,
         64.57299805,  66.69223785],
         ...,
       [ 66.69223785,  62.01016235,  67.35866547, ...,  66.66306305,
         65.75863647,  86.3017807 ]], dtype=float32)

If you do not have access to in intel's MKL download python anaconda and install the accelerate package which has a trial version for 30 days or free for academics that contains a mkl compile. Various other C++ BLAS libraries should work also- even if it copies the array from C to F it should not take more then ~30GB of memory.

The only thing that I can think of that your installation is trying to do is try to hold the entire 50,000 x 50,000 x 265 array in memory which is quite frankly terrible. For reference a float32 50,000 x 50,000 array is only 10GB, while the aforementioned array is 2.6TB...

If its a gemm issue you can try a chunk gemm formula:

def chunk_gemm(A, B, csize):
    out = np.empty((A.shape[0],B.shape[1]), dtype=A.dtype)
    for i in xrange(0, A.shape[0], csize):
        iend = i+csize
        for j in xrange(0, B.shape[1], csize):
            jend = j+csize
            out[i:iend, j:jend] = np.dot(A[i:iend], B[:,j:jend])
    return out

This will be slower, but will hopefully get over your memory issues.

like image 90
Daniel Avatar answered Oct 07 '22 02:10

Daniel


You can try and see if np.einsum works better than dot for your case:

cov = np.einsum('ij,kj->ik', A, A) / n

The internal workings of dot are a little obscure, as it tries to use BLAS optimized routines, which sometimes require copies of arrays to be in Fortran order, not sure if that's the case here. einsum will buffer its inputs, and use vectorized SIMD operations where possible, but outside that it is basically going to run the naive three nested loops to compute the matrix product.

like image 37
Jaime Avatar answered Oct 07 '22 01:10

Jaime