Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating Pearson correlation

I'm trying to calculate the Pearson correlation coefficient of two variables. These variables are to determine if there is a relationship between number of postal codes to a range of distances. So I want to see if the number of postal codes increases/decreases as the distance ranges changes.

I'll have one list which will count the number of postal codes within a distance range and the other list will have the actual ranges.

Is it ok to have a list that contain a range of distances? Or would it be better to have a list like this [50, 100, 500, 1000] where each element would then contain ranges up that amount. So for example the list represents up to 50km, then from 50km to 100km and so on.

like image 998
user94628 Avatar asked Nov 30 '12 15:11

user94628


People also ask

What is the formula for calculating correlation?

Use the formula (zy)i = (yi – ȳ) / s y and calculate a standardized value for each yi. Add the products from the last step together. Divide the sum from the previous step by n – 1, where n is the total number of points in our set of paired data. The result of all of this is the correlation coefficient r.

Why do we calculate Pearson correlation coefficient?

It is known as the best method of measuring the association between variables of interest because it is based on the method of covariance. It gives information about the magnitude of the association, or correlation, as well as the direction of the relationship.


1 Answers

You can also use numpy:

numpy.corrcoef(x, y)

which would give you a correlation matrix that looks like:

[[1          correlation(x, y)]
[correlation(y, x)          1]]
like image 51
Antimony Avatar answered Dec 10 '22 11:12

Antimony