I can get correlation matrix using following commands:
> df<-data.frame(x=c(5,6,5,9,4,2,1,3,5,7),y=c(3.1,2.5,3.8,5.4,6.5,2.5,1.5,8.1,7.1,6.1),z=c(5,6,4,9,2,4,1,6,2,4))
> cor(df)
x y z
x 1.0000000 0.2923939 0.6566866
y 0.2923939 1.0000000 0.1167084
z 0.6566866 0.1167084 1.0000000
>
I can get individual p-values using command:
> cor.test(x,y)$p.value
[1] 0.4123234
How can I get a matrix of p-values for all these correlation coefficients? Thanks for your help.
A p-value is the probability that the null hypothesis is true. In our case, it represents the probability that the correlation between x and y in the sample data occurred by chance. A p-value of 0.05 means that there is only 5% chance that results from your sample occurred due to chance.
The correlation matrix with p-values for an R data frame can be found by using the function rcorr of Hmisc package and read the output as matrix. For example, if we have a data frame called df then the correlation matrix with p-values can be found by using rcorr(as. matrix(df)).
It is shown that p-values are strongly related to correlation coefficients under a true null hypothesis; hence, can reveal the “importance of an association or effect.” Furthermore, it demonstrates why a cut point for statistical significance is still a viable, ancillary tool for assessing the substantive significance ...
The two most commonly used statistical tests for establishing relationship between variables are correlation and p-value. Correlation is a way to test if two variables have any kind of relationship, whereas p-value tells us if the result of an experiment is statistically significant.
You can also use the package Hmisc
.
An example of how it works:
mycor <- rcorr(as.matrix(data), type="pearson")
mycor$r
shows the correlation matrix, mycor$p
the matrix with corresponding p-values.
This example calculates the p value for each of the column combinations. It is not an optimal solution (x-y
and y-x
p values are both calculated for example), but should provide some inspiration for you. The main trick is to use expand.grid
to generate the combinations of columns, and use mapply
to call cor.test
on each combination:
col_combinations = expand.grid(names(df), names(df))
cor_test_wrapper = function(col_name1, col_name2, data_frame) {
cor.test(data_frame[[col_name1]], data_frame[[col_name2]])$p.value
}
p_vals = mapply(cor_test_wrapper,
col_name1 = col_combinations[[1]],
col_name2 = col_combinations[[2]],
MoreArgs = list(data_frame = df))
matrix(p_vals, 3, 3, dimnames = list(names(df), names(df)))
x y z
x 0.00000000 0.4123234 0.03914453
y 0.41232343 0.0000000 0.74814951
z 0.03914453 0.7481495 0.00000000
one way is to use corr.test
(notice the double r) from package psych
.. or if you're a fan of mapply
and sapply
you could write your own function doing this. something like:
rrapply <- function(A, FUN, ...) mapply(function(a, B) lapply(B,
function(x) FUN(a, x, ...)), a = A, MoreArgs = list(B = A))
cor.tests <- rrapply(df, cor.test) # a matrix of cor.tests
apply(cor.tests, 1:2, function(x) x[[1]]$p.value) # and it's there
And now you can use the same logic to make a matrix of t-tests or, say, CI's of correlations
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With