Pairwise Correlation Table

Tags:

I'm new to R, so I apologize if this is a straightforward question, however I've done quite a bit of searching this evening and can't seem to figure it out. I've got a data frame with a whole slew of variables, and what I'd like to do is create a table of the correlations among a subset of these, basically the equivalent of "pwcorr" in Stata, or "correlations" in SPSS. The one key to this is that not only do I want the r, but I also want the significance associated with that value.

Any ideas? This seems like it should be very simple, but I can't seem to figure out a good way.

497

asked Nov 21 '12 03:11

Cody

2 Answers

Bill Venables offers this solution in this answer from the R mailing list to which I've made some slight modifications:

cor.prob <- function(X, dfr = nrow(X) - 2) {
  R <- cor(X)
  above <- row(R) < col(R)
  r2 <- R[above]^2
  Fstat <- r2 * dfr / (1 - r2)
  R[above] <- 1 - pf(Fstat, 1, dfr)

  cor.mat <- t(R)
  cor.mat[upper.tri(cor.mat)] <- NA
  cor.mat
}

So let's test it out:

set.seed(123)
data <- matrix(rnorm(100), 20, 5)
cor.prob(data)

          [,1]      [,2]      [,3]      [,4] [,5]
[1,] 1.0000000        NA        NA        NA   NA
[2,] 0.7005361 1.0000000        NA        NA   NA
[3,] 0.5990483 0.6816955 1.0000000        NA   NA
[4,] 0.6098357 0.3287116 0.5325167 1.0000000   NA
[5,] 0.3364028 0.1121927 0.1329906 0.5962835    1

Does that line up with cor.test?

cor.test(data[,2], data[,3])

 Pearson's product-moment correlation
data:  data[, 2] and data[, 3] 
t = 0.4169, df = 18, p-value = 0.6817
alternative hypothesis: true correlation is not equal to 0 
95 percent confidence interval:
 -0.3603246  0.5178982 
sample estimates:
       cor 
0.09778865

Seems to work ok.

195

answered Oct 14 '22 21:10

sebastian-c

Here is something that I just made, I stumbled on this post because I was looking for a way to take every pair of variables, and get a tidy nX3 dataframe. Column 1 is a variable, Column 2 is a variable, and Column 3 and 4 are their absolute value and true correlation. Just pass the function a dataframe of numeric and integer values.

  pairwiseCor <- function(dataframe){
  pairs <- combn(names(dataframe), 2, simplify=FALSE)
  df <- data.frame(Vairable1=rep(0,length(pairs)), Variable2=rep(0,length(pairs)), 
                   AbsCor=rep(0,length(pairs)), Cor=rep(0,length(pairs)))
  for(i in 1:length(pairs)){
    df[i,1] <- pairs[[i]][1]
    df[i,2] <- pairs[[i]][2]
    df[i,3] <- round(abs(cor(dataframe[,pairs[[i]][1]], dataframe[,pairs[[i]][2]])),4)
    df[i,4] <- round(cor(dataframe[,pairs[[i]][1]], dataframe[,pairs[[i]][2]]),4)
  }
  pairwiseCorDF <- df
  pairwiseCorDF <- pairwiseCorDF[order(pairwiseCorDF$AbsCor, decreasing=TRUE),]
  row.names(pairwiseCorDF) <- 1:length(pairs)
  pairwiseCorDF <<- pairwiseCorDF
  pairwiseCorDF
  }

This is what the output is:

 > head(pairwiseCorDF)
             Vairable1        Variable2 AbsCor     Cor
    1        roll_belt     accel_belt_z 0.9920 -0.9920
    2 gyros_dumbbell_x gyros_dumbbell_z 0.9839 -0.9839
    3        roll_belt total_accel_belt 0.9811  0.9811
    4 total_accel_belt     accel_belt_z 0.9752 -0.9752
    5       pitch_belt     accel_belt_x 0.9658 -0.9658
    6 gyros_dumbbell_z  gyros_forearm_z 0.9491  0.9491

answered Oct 14 '22 23:10

user3728456

Related questions
                            
                                R-Project: xlsx package installation failure (due to java issues)
                            
                                devtools::install_github fails with CA cert error
                            
                                Efficiently plotting millions of data points in R
                            
                                Assign point color depending on data.frame column value R
                            
                                How to change and remove default library location?
                            
                                Resize plotly R ggplotly
                            
                                How do you check for a scalar in R?
                            
                                Split character vector at math comparisons signs in R
                            
                                Could not find function 'fread' in R 3.4 while reading a big dataset
                            
                                Convert scientific notation to numeric, preserving decimals
                            
                                How to fix "failed to load cairo DLL" in R?
                            
                                What's the difference between ggplot and basic plot in R? [closed]
                            
                                Warning: “Variables with usage in documentation object ‘FANG’ but not in code:”
                            
                                Making R package work in both Windows and Linux
                            
                                How can I collapse a dataframe by some variables, taking mean across others
                            
                                class "By" into dataframe in R
                            
                                R interactive plot?
                            
                                Interpolate missing values in a time series with a seasonal cycle
                            
                                Converting a Document Term Matrix into a Matrix with lots of data causes overflow
                            
                                Importing an array from matlab into R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pairwise Correlation Table

Tags:

r

statistics

stata

spss

Cody

People also ask

2 Answers

sebastian-c

user3728456

Recent Activity

Donate For Us