I have a large number of cross-correlations to calculate and I'm looking for the fastest way to do it. I'm assuming vectorizing the problem would help rather than doing it with loops
I have a 3D array labelled as electrode x timepoint x trial (shape: 64x256x913). I want to calculate the max cross-correlation of the timepoints for every pair of electrodes, for every trial.
Specifically: for every trial, I want to take each of the pair combination of electrodes and calculate the max cross-correlation value for every pair. That will result in 4096 (64*64) max cross-correlation values in a single row/vector. That will be done for every trial, stacking each of the rows/vectors on top of each other resulting in a final 2D array of shape 913*4096 containing max cross-correlation values
Thats a lot of computation but I want to try to find the fastest method to do it. I mocked up some proto-code using lists as containers that may help explain the problem a bit better. There may be some logic errors in there, but either way the code doesn't run on my computer because theres so much to calculate that python just freezes. Here is it:
#allData is a 64x256x913 array
all_experiment_trials = []
for trial in range(allData.shape[2]):
all_trial_electrodes = []
for electrode in range(allData.shape[0]):
for other_electrode in range(allData.shape[0]):
if electrode == other_electrode:
pass
else:
single_xcorr = max(np.correlate(allData[electrode,:,trial], allData[other_electrode,:,trial], "full"))
all_trial_electrodes.append(single_xcorr)
all_experiment_trials.append(all_trial_electrodes)
Obviously loops are really slow for this type of thing. Is there a vectorized solution to this using numpy arrays?
I've checked out things like correlate2d() and the like, but I don't think they really work in my case as I'm not multiplying 2 matrices together
The vectorized function evaluates pyfunc over successive tuples of the input arrays like the python map function, except it uses the broadcasting rules of numpy. The data type of the output of vectorized is determined by calling the function with the first element of the input.
The Pearson Correlation coefficient can be computed in Python using corrcoef() method from Numpy. The input for this function is typically a matrix, say of size mxn , where: Each column represents the values of a random variable. Each row represents a single sample of n random variables.
numpy. correlate simply returns the cross-correlation of two vectors.
Understanding Cross-Correlation Cross-correlation is generally used when measuring information between two different time series. The possible range for the correlation coefficient of the time series data is from -1.0 to +1.0. The closer the cross-correlation value is to 1, the more closely the sets are identical.
Here's one vectorized approach based on np.einsum
-
def vectorized_approach(allData):
# Get shape
M,N,R = allData.shape
# Valid mask based on condition: "if electrode == other_electrode"
valid_mask = np.mod(np.arange(M*M),M+1)!=0
# Elementwise multiplications across all elements in axis=0,
# and then summation along axis=1
out = np.einsum('ijkl,ijkl->lij',allData[:,None,:,:],allData[None,:,:,:])
# Use valid mask to skip columns and have the final output
return out.reshape(R,-1)[:,valid_mask]
Runtime test and verify results -
In [10]: allData = np.random.rand(20,80,200)
In [11]: def org_approach(allData):
...: all_experiment_trials = []
...: for trial in range(allData.shape[2]):
...: all_trial_electrodes = []
...: for electrode in range(allData.shape[0]):
...: for other_electrode in range(allData.shape[0]):
...: if electrode == other_electrode:
...: pass
...: else:
...: single_xcorr = max(np.correlate(allData[electrode,:,trial], allData[other_electrode,:,trial]))
...: all_trial_electrodes.append(single_xcorr)
...: all_experiment_trials.append(all_trial_electrodes)
...: return all_experiment_trials
...:
In [12]: %timeit org_approach(allData)
1 loops, best of 3: 1.04 s per loop
In [13]: %timeit vectorized_approach(allData)
100 loops, best of 3: 15.1 ms per loop
In [14]: np.allclose(vectorized_approach(allData),np.asarray(org_approach(allData)))
Out[14]: True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With