Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pairwise Correlation Table

I'm new to R, so I apologize if this is a straightforward question, however I've done quite a bit of searching this evening and can't seem to figure it out. I've got a data frame with a whole slew of variables, and what I'd like to do is create a table of the correlations among a subset of these, basically the equivalent of "pwcorr" in Stata, or "correlations" in SPSS. The one key to this is that not only do I want the r, but I also want the significance associated with that value.

Any ideas? This seems like it should be very simple, but I can't seem to figure out a good way.

like image 497
Cody Avatar asked Nov 21 '12 03:11

Cody


People also ask

What is a pairwise correlation?

Using pairwise correlation for feature selection is all about that — identifying groups of highly correlated features and only keeping one of them so that your model can have as much predictive power using as few features as possible.

What does a correlation table tell you?

A correlation matrix is a table showing correlation coefficients between variables. Each cell in the table shows the correlation between two variables. A correlation matrix is used to summarize data, as an input into a more advanced analysis, and as a diagnostic for advanced analyses.

How do you do a pairwise correlation in SPSS?

To run a bivariate Pearson Correlation in SPSS, click Analyze > Correlate > Bivariate. The Bivariate Correlations window opens, where you will specify the variables to be used in the analysis. All of the variables in your dataset appear in the list on the left side.


2 Answers

Bill Venables offers this solution in this answer from the R mailing list to which I've made some slight modifications:

cor.prob <- function(X, dfr = nrow(X) - 2) {
  R <- cor(X)
  above <- row(R) < col(R)
  r2 <- R[above]^2
  Fstat <- r2 * dfr / (1 - r2)
  R[above] <- 1 - pf(Fstat, 1, dfr)

  cor.mat <- t(R)
  cor.mat[upper.tri(cor.mat)] <- NA
  cor.mat
}

So let's test it out:

set.seed(123)
data <- matrix(rnorm(100), 20, 5)
cor.prob(data)

          [,1]      [,2]      [,3]      [,4] [,5]
[1,] 1.0000000        NA        NA        NA   NA
[2,] 0.7005361 1.0000000        NA        NA   NA
[3,] 0.5990483 0.6816955 1.0000000        NA   NA
[4,] 0.6098357 0.3287116 0.5325167 1.0000000   NA
[5,] 0.3364028 0.1121927 0.1329906 0.5962835    1

Does that line up with cor.test?

cor.test(data[,2], data[,3])

 Pearson's product-moment correlation
data:  data[, 2] and data[, 3] 
t = 0.4169, df = 18, p-value = 0.6817
alternative hypothesis: true correlation is not equal to 0 
95 percent confidence interval:
 -0.3603246  0.5178982 
sample estimates:
       cor 
0.09778865 

Seems to work ok.

like image 195
sebastian-c Avatar answered Oct 14 '22 21:10

sebastian-c


Here is something that I just made, I stumbled on this post because I was looking for a way to take every pair of variables, and get a tidy nX3 dataframe. Column 1 is a variable, Column 2 is a variable, and Column 3 and 4 are their absolute value and true correlation. Just pass the function a dataframe of numeric and integer values.

  pairwiseCor <- function(dataframe){
  pairs <- combn(names(dataframe), 2, simplify=FALSE)
  df <- data.frame(Vairable1=rep(0,length(pairs)), Variable2=rep(0,length(pairs)), 
                   AbsCor=rep(0,length(pairs)), Cor=rep(0,length(pairs)))
  for(i in 1:length(pairs)){
    df[i,1] <- pairs[[i]][1]
    df[i,2] <- pairs[[i]][2]
    df[i,3] <- round(abs(cor(dataframe[,pairs[[i]][1]], dataframe[,pairs[[i]][2]])),4)
    df[i,4] <- round(cor(dataframe[,pairs[[i]][1]], dataframe[,pairs[[i]][2]]),4)
  }
  pairwiseCorDF <- df
  pairwiseCorDF <- pairwiseCorDF[order(pairwiseCorDF$AbsCor, decreasing=TRUE),]
  row.names(pairwiseCorDF) <- 1:length(pairs)
  pairwiseCorDF <<- pairwiseCorDF
  pairwiseCorDF
  }

This is what the output is:

 > head(pairwiseCorDF)
             Vairable1        Variable2 AbsCor     Cor
    1        roll_belt     accel_belt_z 0.9920 -0.9920
    2 gyros_dumbbell_x gyros_dumbbell_z 0.9839 -0.9839
    3        roll_belt total_accel_belt 0.9811  0.9811
    4 total_accel_belt     accel_belt_z 0.9752 -0.9752
    5       pitch_belt     accel_belt_x 0.9658 -0.9658
    6 gyros_dumbbell_z  gyros_forearm_z 0.9491  0.9491
like image 24
user3728456 Avatar answered Oct 14 '22 23:10

user3728456