Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy cross-correlation - vectorizing

I have a large number of cross-correlations to calculate and I'm looking for the fastest way to do it. I'm assuming vectorizing the problem would help rather than doing it with loops

I have a 3D array labelled as electrode x timepoint x trial (shape: 64x256x913). I want to calculate the max cross-correlation of the timepoints for every pair of electrodes, for every trial.

Specifically: for every trial, I want to take each of the pair combination of electrodes and calculate the max cross-correlation value for every pair. That will result in 4096 (64*64) max cross-correlation values in a single row/vector. That will be done for every trial, stacking each of the rows/vectors on top of each other resulting in a final 2D array of shape 913*4096 containing max cross-correlation values

Thats a lot of computation but I want to try to find the fastest method to do it. I mocked up some proto-code using lists as containers that may help explain the problem a bit better. There may be some logic errors in there, but either way the code doesn't run on my computer because theres so much to calculate that python just freezes. Here is it:

#allData is a 64x256x913 array

all_experiment_trials = []
for trial in range(allData.shape[2]):
    all_trial_electrodes = []
    for electrode in range(allData.shape[0]):
        for other_electrode in range(allData.shape[0]):
            if electrode == other_electrode:
                pass
            else:
                single_xcorr = max(np.correlate(allData[electrode,:,trial], allData[other_electrode,:,trial], "full"))
                all_trial_electrodes.append(single_xcorr)
    all_experiment_trials.append(all_trial_electrodes)

Obviously loops are really slow for this type of thing. Is there a vectorized solution to this using numpy arrays?

I've checked out things like correlate2d() and the like, but I don't think they really work in my case as I'm not multiplying 2 matrices together

like image 982
Simon Avatar asked Jul 31 '15 07:07

Simon


People also ask

What does vectorize do in NumPy?

The vectorized function evaluates pyfunc over successive tuples of the input arrays like the python map function, except it uses the broadcasting rules of numpy. The data type of the output of vectorized is determined by calling the function with the first element of the input.

How do you find the NumPy correlation coefficient?

The Pearson Correlation coefficient can be computed in Python using corrcoef() method from Numpy. The input for this function is typically a matrix, say of size mxn , where: Each column represents the values of a random variable. Each row represents a single sample of n random variables.

What does NumPy correlate return?

numpy. correlate simply returns the cross-correlation of two vectors.

How do you interpret cross-correlation?

Understanding Cross-Correlation Cross-correlation is generally used when measuring information between two different time series. The possible range for the correlation coefficient of the time series data is from -1.0 to +1.0. The closer the cross-correlation value is to 1, the more closely the sets are identical.


1 Answers

Here's one vectorized approach based on np.einsum -

def vectorized_approach(allData):
    # Get shape
    M,N,R = allData.shape

    # Valid mask based on condition: "if electrode == other_electrode"
    valid_mask = np.mod(np.arange(M*M),M+1)!=0

    # Elementwise multiplications across all elements in axis=0, 
    # and then summation along axis=1
    out = np.einsum('ijkl,ijkl->lij',allData[:,None,:,:],allData[None,:,:,:])

    # Use valid mask to skip columns and have the final output
    return out.reshape(R,-1)[:,valid_mask]

Runtime test and verify results -

In [10]: allData = np.random.rand(20,80,200)

In [11]: def org_approach(allData):
    ...:     all_experiment_trials = []
    ...:     for trial in range(allData.shape[2]):
    ...:         all_trial_electrodes = []
    ...:         for electrode in range(allData.shape[0]):
    ...:             for other_electrode in range(allData.shape[0]):
    ...:                 if electrode == other_electrode:
    ...:                     pass
    ...:                 else:
    ...:                     single_xcorr = max(np.correlate(allData[electrode,:,trial], allData[other_electrode,:,trial]))
    ...:                     all_trial_electrodes.append(single_xcorr)
    ...:         all_experiment_trials.append(all_trial_electrodes)
    ...:     return all_experiment_trials
    ...: 

In [12]: %timeit org_approach(allData)
1 loops, best of 3: 1.04 s per loop

In [13]: %timeit vectorized_approach(allData)
100 loops, best of 3: 15.1 ms per loop

In [14]: np.allclose(vectorized_approach(allData),np.asarray(org_approach(allData)))
Out[14]: True
like image 50
Divakar Avatar answered Oct 03 '22 03:10

Divakar