Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Julia to compute the pearson correlation coefficient with p-value?

Tags:

julia

I am looking for help to calculate the Pearson correlation coefficient with p-value by using Julia language. The analogous function in Python is scipy.stats.pearson.

The Julia function below only gives me the correlation. Appreciate your help/hint about the p-value part.

using RDatasets, Statistics
iris = dataset("datasets", "iris");
Statistics.cor(iris.SepalLength, iris.SepalWidth)
like image 857
Puriney Avatar asked Nov 16 '18 21:11

Puriney


People also ask

How is p-value calculated in Pearson correlation?

The test statistics for Pearson's correlation coefficient and Spearman's correlation coefficient have the same formula: The p-value is 2 × P(T > t) where T follows a t distribution with n – 2 degrees of freedom.

Is Pearson correlation same as p-value?

The two most commonly used statistical tests for establishing relationship between variables are correlation and p-value. Correlation is a way to test if two variables have any kind of relationship, whereas p-value tells us if the result of an experiment is statistically significant.

How does p-value show correlation?

A p-value is the probability that the null hypothesis is true. In our case, it represents the probability that the correlation between x and y in the sample data occurred by chance. A p-value of 0.05 means that there is only 5% chance that results from your sample occurred due to chance.

Is Pearson correlation R or P?

Testing for the significance of the Pearson correlation coefficient. The Pearson correlation coefficient can also be used to test whether the relationship between two variables is significant. The Pearson correlation of the sample is r.


2 Answers

I do not know about an existing implementation but here is a two-sided test with H0 equal to 0 using Fisher transformation:

using Distributions

cortest(x,y) =
    if length(x) == length(y)
        2 * ccdf(Normal(), atanh(abs(cor(x, y))) * sqrt(length(x) - 3))
    else
        error("x and y have different lengths")
    end

or use the HypothesisTests.jl package, e.g.:

using HypothesisTests

OneSampleZTest(atanh(cor(iris.SepalLength, iris.SepalWidth)),
               1, nrow(iris)-3)
like image 143
Bogumił Kamiński Avatar answered Jan 02 '23 20:01

Bogumił Kamiński


Now you can also use the function pvalue from HypothesisTests, for example:

 using HypothesisTests
 x = [1,2,3]; y = [2,3,5];
 pvalue(CorrelationTest(x,y))

This example returns 0.1210377 which is the same than python's scipy.stats.pearsonr and R's cor.test.

like image 23
Alex338207 Avatar answered Jan 02 '23 22:01

Alex338207