I have two 1D arrays and I want to see their inter-relationships. What procedure should I use in numpy? I am using numpy.corrcoef(arrayA, arrayB)
and numpy.correlate(arrayA, arrayB)
and both are giving some results that I am not able to comprehend or understand.
Can somebody please shed light on how to understand and interpret those numerical results (preferably, using an example)?
numpy. correlate simply returns the cross-correlation of two vectors.
To calculate the correlation between two variables in Python, we can use the Numpy corrcoef() function. import numpy as np np. random. seed(100) #create array of 50 random integers between 0 and 10 var1 = np.
Cross-Correlation It is calculated simply by multiplying and summing two-time series together. In the following example, graphs A and B are cross-correlated but graph C is not correlated to either.
numpy.correlate
simply returns the cross-correlation of two vectors.
if you need to understand cross-correlation, then start with http://en.wikipedia.org/wiki/Cross-correlation.
A good example might be seen by looking at the autocorrelation function (a vector cross-correlated with itself):
import numpy as np # create a vector vector = np.random.normal(0,1,size=1000) # insert a signal into vector vector[::50]+=10 # perform cross-correlation for all data points output = np.correlate(vector,vector,mode='full')
This will return a comb/shah function with a maximum when both data sets are overlapping. As this is an autocorrelation there will be no "lag" between the two input signals. The maximum of the correlation is therefore vector.size-1.
if you only want the value of the correlation for overlapping data, you can use mode='valid'
.
I can only comment on numpy.correlate
at the moment. It's a powerful tool. I have used it for two purposes. The first is to find a pattern inside another pattern:
import numpy as np import matplotlib.pyplot as plt some_data = np.random.uniform(0,1,size=100) subset = some_data[42:50] mean = np.mean(some_data) some_data_normalised = some_data - mean subset_normalised = subset - mean correlated = np.correlate(some_data_normalised, subset_normalised) max_index = np.argmax(correlated) # 42 !
The second use I have used it for (and how to interpret the result) is for frequency detection:
hz_a = np.cos(np.linspace(0,np.pi*6,100)) hz_b = np.cos(np.linspace(0,np.pi*4,100)) f, axarr = plt.subplots(2, sharex=True) axarr[0].plot(hz_a) axarr[0].plot(hz_b) axarr[0].grid(True) hz_a_autocorrelation = np.correlate(hz_a,hz_a,'same')[round(len(hz_a)/2):] hz_b_autocorrelation = np.correlate(hz_b,hz_b,'same')[round(len(hz_b)/2):] axarr[1].plot(hz_a_autocorrelation) axarr[1].plot(hz_b_autocorrelation) axarr[1].grid(True) plt.show()
Find the index of the second peaks. From this you can work back to find the frequency.
first_min_index = np.argmin(hz_a_autocorrelation) second_max_index = np.argmax(hz_a_autocorrelation[first_min_index:]) frequency = 1/second_max_index
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With