In Python, how can I calculate correlation and statistical significance between two arrays of data?

Tags:

I have sets of data with two equally long arrays of data, or I can make an array of two-item entries, and I would like to calculate the correlation and statistical significance represented by the data (which may be tightly correlated, or may have no statistically significant correlation).

I am programming in Python and have scipy and numpy installed. I looked and found Calculating Pearson correlation and significance in Python, but that seems to want the data to be manipulated so it falls into a specified range.

What is the proper way to, I assume, ask scipy or numpy to give me the correlation and statistical significance of two arrays?

473

asked Jun 20 '12 14:06

Christos Hayward

2 Answers

If you want to calculate the Pearson Correlation Coefficient, then scipy.stats.pearsonr is the way to go; although, the significance is only meaningful for larger data sets. This function does not require the data to be manipulated to fall into a specified range. The value for the correlation falls in the interval [-1,1], perhaps that was the confusion?

If the significance is not terribly important, you can use numpy.corrcoef().

The Mahalanobis distance does take into account the correlation between two arrays, but it provides a distance measure, not a correlation. (Mathematically, the Mahalanobis distance is not a true distance function; nevertheless, it can be used as such in certain contexts to great advantage.)

196

answered Oct 13 '22 00:10

cjohnson318

You can use the Mahalanobis distance between these two arrays, which takes into account the correlation between them.

The function is in the scipy package: scipy.spatial.distance.mahalanobis

There's a nice example here

answered Oct 12 '22 23:10

Oriol Nieto

Related questions
                            
                                What's the pythonic way of generating a range of chars?
                            
                                Using both __setattr__ and descriptors for a python class
                            
                                How to select which screen ImageGrab.grab() grabs in a multi-monitor setup?
                            
                                Is python zipfile thread-safe?
                            
                                sqlalchemy: stopping a long-running query
                            
                                Python + alglib + NumPy: how to avoid converting arrays to lists?
                            
                                drawing bivariate gaussian distributions in matplotlib
                            
                                Recording synthesized text-to-speech to a file in Python
                            
                                How to configure setup.py to have pip install from GitHub master?
                            
                                Different Python Google APIs
                            
                                Sorting in Sparse Matrix
                            
                                Full proto too large to save, cleared variables
                            
                                What's the use case for __new__ method to return an object of a different type than its first arg?
                            
                                major memory problems reading in a csv file using numpy
                            
                                What tools are there to cross-create an OSX installer for a python package?
                            
                                Numpy equivalent of Matlab's findpeaks function? [duplicate]
                            
                                inet_aton similar function for IPv6
                            
                                python: weird list elements combination
                            
                                Script to Extract data from web page
                            
                                How might one specify or add a directory to the Python.h search path during a module build/install using setup.py?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

In Python, how can I calculate correlation and statistical significance between two arrays of data?

Tags:

python

numpy

statistics

scipy

correlation

Christos Hayward

People also ask

2 Answers

cjohnson318

Oriol Nieto

Recent Activity

Donate For Us