How to interpret the values returned by numpy.correlate and numpy.corrcoef?

Tags:

I have two 1D arrays and I want to see their inter-relationships. What procedure should I use in numpy? I am using numpy.corrcoef(arrayA, arrayB) and numpy.correlate(arrayA, arrayB) and both are giving some results that I am not able to comprehend or understand.

Can somebody please shed light on how to understand and interpret those numerical results (preferably, using an example)?

345

asked Nov 18 '12 11:11

khan

2 Answers

numpy.correlate simply returns the cross-correlation of two vectors.

if you need to understand cross-correlation, then start with http://en.wikipedia.org/wiki/Cross-correlation.

A good example might be seen by looking at the autocorrelation function (a vector cross-correlated with itself):

import numpy as np  # create a vector vector = np.random.normal(0,1,size=1000)   # insert a signal into vector vector[::50]+=10  # perform cross-correlation for all data points output = np.correlate(vector,vector,mode='full')

Code graph

This will return a comb/shah function with a maximum when both data sets are overlapping. As this is an autocorrelation there will be no "lag" between the two input signals. The maximum of the correlation is therefore vector.size-1.

if you only want the value of the correlation for overlapping data, you can use mode='valid'.

137

answered Sep 30 '22 05:09

ebarr

I can only comment on numpy.correlate at the moment. It's a powerful tool. I have used it for two purposes. The first is to find a pattern inside another pattern:

import numpy as np import matplotlib.pyplot as plt  some_data = np.random.uniform(0,1,size=100) subset = some_data[42:50]  mean = np.mean(some_data) some_data_normalised = some_data - mean subset_normalised = subset - mean  correlated = np.correlate(some_data_normalised, subset_normalised) max_index = np.argmax(correlated)  # 42 !

The second use I have used it for (and how to interpret the result) is for frequency detection:

hz_a = np.cos(np.linspace(0,np.pi*6,100)) hz_b = np.cos(np.linspace(0,np.pi*4,100))  f, axarr = plt.subplots(2, sharex=True)  axarr[0].plot(hz_a) axarr[0].plot(hz_b) axarr[0].grid(True)  hz_a_autocorrelation = np.correlate(hz_a,hz_a,'same')[round(len(hz_a)/2):] hz_b_autocorrelation = np.correlate(hz_b,hz_b,'same')[round(len(hz_b)/2):]  axarr[1].plot(hz_a_autocorrelation) axarr[1].plot(hz_b_autocorrelation) axarr[1].grid(True)  plt.show()

three hz and two hz with autocorrelation show beneath

Find the index of the second peaks. From this you can work back to find the frequency.

first_min_index = np.argmin(hz_a_autocorrelation) second_max_index = np.argmax(hz_a_autocorrelation[first_min_index:]) frequency = 1/second_max_index

answered Sep 30 '22 07:09

AJP

Related questions
                            
                                How to extract numbers (along with comparison adjectives or ranges)
                            
                                Eventlet or gevent or Stackless + Twisted, Pylons, Django and SQL Alchemy
                            
                                Which form of connection to use with pika
                            
                                Python Classes without using def __init__(self)
                            
                                How to install / update package with pipenv without updating the rest of packages
                            
                                How do I propagate C++ exceptions to Python in a SWIG wrapper library?
                            
                                Python: thinking of a module and its variables as a singleton — Clean approach?
                            
                                Python "expected an indented block"
                            
                                How do I use different Python version in venv from standard library? (Not virtualenv!)
                            
                                Why is list(x for x in a) faster for a=[0] than for a=[]?
                            
                                "Boilerplate" code in Python?
                            
                                regexp: match character group or end of line
                            
                                Turn functions with a callback into Python generators?
                            
                                Numpy argmax. How to compute both max and argmax?
                            
                                Matplotlib legend, add items across columns instead of down
                            
                                Python: how to "kill" a class instance/object?
                            
                                double click to open an ipython notebook
                            
                                Is there a python equivalent for RSpec to do TDD?
                            
                                Speed of calculating powers (in python)
                            
                                How can I detect and track people using OpenCV?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to interpret the values returned by numpy.correlate and numpy.corrcoef?

Tags:

python

numpy

scipy

correlation

khan

People also ask

2 Answers

ebarr

AJP

Recent Activity

Donate For Us