I'm trying to calculate the Pearson correlation coefficient of two variables. These variables are to determine if there is a relationship between number of postal codes to a range of distances. So I want to see if the number of postal codes increases/decreases as the distance ranges changes.
I'll have one list which will count the number of postal codes within a distance range and the other list will have the actual ranges.
Is it ok to have a list that contain a range of distances? Or would it be better to have a list like this [50, 100, 500, 1000] where each element would then contain ranges up that amount. So for example the list represents up to 50km, then from 50km to 100km and so on.
Use the formula (zy)i = (yi – ȳ) / s y and calculate a standardized value for each yi. Add the products from the last step together. Divide the sum from the previous step by n – 1, where n is the total number of points in our set of paired data. The result of all of this is the correlation coefficient r.
It is known as the best method of measuring the association between variables of interest because it is based on the method of covariance. It gives information about the magnitude of the association, or correlation, as well as the direction of the relationship.
You can also use numpy
:
numpy.corrcoef(x, y)
which would give you a correlation matrix that looks like:
[[1 correlation(x, y)]
[correlation(y, x) 1]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With