Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to interpret the values returned by numpy.correlate and numpy.corrcoef?

I have two 1D arrays and I want to see their inter-relationships. What procedure should I use in numpy? I am using numpy.corrcoef(arrayA, arrayB) and numpy.correlate(arrayA, arrayB) and both are giving some results that I am not able to comprehend or understand.

Can somebody please shed light on how to understand and interpret those numerical results (preferably, using an example)?

like image 345
khan Avatar asked Nov 18 '12 11:11

khan


People also ask

What does Numpy correlate return?

numpy. correlate simply returns the cross-correlation of two vectors.

How do you find the correlation between two arrays in Python?

To calculate the correlation between two variables in Python, we can use the Numpy corrcoef() function. import numpy as np np. random. seed(100) #create array of 50 random integers between 0 and 10 var1 = np.

How do you calculate cross-correlation?

Cross-Correlation It is calculated simply by multiplying and summing two-time series together. In the following example, graphs A and B are cross-correlated but graph C is not correlated to either.


2 Answers

numpy.correlate simply returns the cross-correlation of two vectors.

if you need to understand cross-correlation, then start with http://en.wikipedia.org/wiki/Cross-correlation.

A good example might be seen by looking at the autocorrelation function (a vector cross-correlated with itself):

import numpy as np  # create a vector vector = np.random.normal(0,1,size=1000)   # insert a signal into vector vector[::50]+=10  # perform cross-correlation for all data points output = np.correlate(vector,vector,mode='full') 

Code graph

This will return a comb/shah function with a maximum when both data sets are overlapping. As this is an autocorrelation there will be no "lag" between the two input signals. The maximum of the correlation is therefore vector.size-1.

if you only want the value of the correlation for overlapping data, you can use mode='valid'.

like image 137
ebarr Avatar answered Sep 30 '22 05:09

ebarr


I can only comment on numpy.correlate at the moment. It's a powerful tool. I have used it for two purposes. The first is to find a pattern inside another pattern:

import numpy as np import matplotlib.pyplot as plt  some_data = np.random.uniform(0,1,size=100) subset = some_data[42:50]  mean = np.mean(some_data) some_data_normalised = some_data - mean subset_normalised = subset - mean  correlated = np.correlate(some_data_normalised, subset_normalised) max_index = np.argmax(correlated)  # 42 ! 

The second use I have used it for (and how to interpret the result) is for frequency detection:

hz_a = np.cos(np.linspace(0,np.pi*6,100)) hz_b = np.cos(np.linspace(0,np.pi*4,100))  f, axarr = plt.subplots(2, sharex=True)  axarr[0].plot(hz_a) axarr[0].plot(hz_b) axarr[0].grid(True)  hz_a_autocorrelation = np.correlate(hz_a,hz_a,'same')[round(len(hz_a)/2):] hz_b_autocorrelation = np.correlate(hz_b,hz_b,'same')[round(len(hz_b)/2):]  axarr[1].plot(hz_a_autocorrelation) axarr[1].plot(hz_b_autocorrelation) axarr[1].grid(True)  plt.show() 

three hz and two hz with autocorrelation show beneath

Find the index of the second peaks. From this you can work back to find the frequency.

first_min_index = np.argmin(hz_a_autocorrelation) second_max_index = np.argmax(hz_a_autocorrelation[first_min_index:]) frequency = 1/second_max_index 
like image 30
AJP Avatar answered Sep 30 '22 07:09

AJP