Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating correlation matrix p values [duplicate]

Tags:

r

I can get correlation matrix using following commands:

> df<-data.frame(x=c(5,6,5,9,4,2,1,3,5,7),y=c(3.1,2.5,3.8,5.4,6.5,2.5,1.5,8.1,7.1,6.1),z=c(5,6,4,9,2,4,1,6,2,4))
> cor(df)
           x         y        z
x  1.0000000 0.2923939 0.6566866
y  0.2923939 1.0000000 0.1167084
z 0.6566866 0.1167084 1.0000000
>

I can get individual p-values using command:

> cor.test(x,y)$p.value
[1] 0.4123234

How can I get a matrix of p-values for all these correlation coefficients? Thanks for your help.

like image 531
rnso Avatar asked Apr 14 '14 07:04

rnso


People also ask

What is p-value in correlation matrix?

A p-value is the probability that the null hypothesis is true. In our case, it represents the probability that the correlation between x and y in the sample data occurred by chance. A p-value of 0.05 means that there is only 5% chance that results from your sample occurred due to chance.

How do you find the p-value from a correlation matrix?

The correlation matrix with p-values for an R data frame can be found by using the function rcorr of Hmisc package and read the output as matrix. For example, if we have a data frame called df then the correlation matrix with p-values can be found by using rcorr(as. matrix(df)).

Does correlation affect p-value?

It is shown that p-values are strongly related to correlation coefficients under a true null hypothesis; hence, can reveal the “importance of an association or effect.” Furthermore, it demonstrates why a cut point for statistical significance is still a viable, ancillary tool for assessing the substantive significance ...

How does p-value relate to correlation?

The two most commonly used statistical tests for establishing relationship between variables are correlation and p-value. Correlation is a way to test if two variables have any kind of relationship, whereas p-value tells us if the result of an experiment is statistically significant.


3 Answers

You can also use the package Hmisc.

An example of how it works:

mycor <- rcorr(as.matrix(data), type="pearson")

mycor$r shows the correlation matrix, mycor$p the matrix with corresponding p-values.

like image 107
erc Avatar answered Oct 26 '22 00:10

erc


This example calculates the p value for each of the column combinations. It is not an optimal solution (x-y and y-x p values are both calculated for example), but should provide some inspiration for you. The main trick is to use expand.grid to generate the combinations of columns, and use mapply to call cor.test on each combination:

col_combinations = expand.grid(names(df), names(df))
cor_test_wrapper = function(col_name1, col_name2, data_frame) {
    cor.test(data_frame[[col_name1]], data_frame[[col_name2]])$p.value
}
p_vals = mapply(cor_test_wrapper, 
                  col_name1 = col_combinations[[1]], 
                  col_name2 = col_combinations[[2]], 
                  MoreArgs = list(data_frame = df))
matrix(p_vals, 3, 3, dimnames = list(names(df), names(df)))
           x         y          z
x 0.00000000 0.4123234 0.03914453
y 0.41232343 0.0000000 0.74814951
z 0.03914453 0.7481495 0.00000000
like image 21
Paul Hiemstra Avatar answered Oct 25 '22 23:10

Paul Hiemstra


one way is to use corr.test (notice the double r) from package psych

.. or if you're a fan of mapply and sapply you could write your own function doing this. something like:

rrapply <- function(A, FUN, ...) mapply(function(a, B) lapply(B, 
         function(x) FUN(a, x, ...)), a = A, MoreArgs = list(B = A))
cor.tests <- rrapply(df, cor.test) # a matrix of cor.tests
apply(cor.tests, 1:2, function(x) x[[1]]$p.value) # and it's there

And now you can use the same logic to make a matrix of t-tests or, say, CI's of correlations

like image 27
lebatsnok Avatar answered Oct 26 '22 00:10

lebatsnok