Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate correlation for more than two variables?

Tags:

r

People also ask

Can you correlate more than 2 variables?

Multiple Correlation for more than 3 variables Definition 1 defines the multiple correlation coefficient Rz,xy and the corresponding multiple coefficient of determination for three variables x, y, and z. We can extend these definitions to more than three variables as described in Advanced Multiple Correlation.

Can you have 3 variables in correlation?

A correlation is usually tested for two variables at a time, but you can test correlations between three or more variables.

Can Pearson correlation be used for 3 variables?

If You need to calculate "correlation" between three or more variables, you could not use Pearson, as in this case it will be different for different order of variables have a look here.

Can you do Pearson correlation for more than 2 variables?

AVariables: The variables to be used in the bivariate Pearson Correlation. You must select at least two continuous variables, but may select more than two. The test will produce correlation coefficients for each pair of variables in this list.


Use the same function (cor) on a data frame, e.g.:

> cor(VADeaths)
             Rural Male Rural Female Urban Male Urban Female
Rural Male    1.0000000    0.9979869  0.9841907    0.9934646
Rural Female  0.9979869    1.0000000  0.9739053    0.9867310
Urban Male    0.9841907    0.9739053  1.0000000    0.9918262
Urban Female  0.9934646    0.9867310  0.9918262    1.0000000

Or, on a data frame also holding discrete variables, (also sometimes referred to as factors), try something like the following:

> cor(mtcars[,unlist(lapply(mtcars, is.numeric))])
            mpg        cyl       disp         hp        drat         wt        qsec         vs          am       gear        carb
mpg   1.0000000 -0.8521620 -0.8475514 -0.7761684  0.68117191 -0.8676594  0.41868403  0.6640389  0.59983243  0.4802848 -0.55092507
cyl  -0.8521620  1.0000000  0.9020329  0.8324475 -0.69993811  0.7824958 -0.59124207 -0.8108118 -0.52260705 -0.4926866  0.52698829
disp -0.8475514  0.9020329  1.0000000  0.7909486 -0.71021393  0.8879799 -0.43369788 -0.7104159 -0.59122704 -0.5555692  0.39497686
hp   -0.7761684  0.8324475  0.7909486  1.0000000 -0.44875912  0.6587479 -0.70822339 -0.7230967 -0.24320426 -0.1257043  0.74981247
drat  0.6811719 -0.6999381 -0.7102139 -0.4487591  1.00000000 -0.7124406  0.09120476  0.4402785  0.71271113  0.6996101 -0.09078980
wt   -0.8676594  0.7824958  0.8879799  0.6587479 -0.71244065  1.0000000 -0.17471588 -0.5549157 -0.69249526 -0.5832870  0.42760594
qsec  0.4186840 -0.5912421 -0.4336979 -0.7082234  0.09120476 -0.1747159  1.00000000  0.7445354 -0.22986086 -0.2126822 -0.65624923
vs    0.6640389 -0.8108118 -0.7104159 -0.7230967  0.44027846 -0.5549157  0.74453544  1.0000000  0.16834512  0.2060233 -0.56960714
am    0.5998324 -0.5226070 -0.5912270 -0.2432043  0.71271113 -0.6924953 -0.22986086  0.1683451  1.00000000  0.7940588  0.05753435
gear  0.4802848 -0.4926866 -0.5555692 -0.1257043  0.69961013 -0.5832870 -0.21268223  0.2060233  0.79405876  1.0000000  0.27407284
carb -0.5509251  0.5269883  0.3949769  0.7498125 -0.09078980  0.4276059 -0.65624923 -0.5696071  0.05753435  0.2740728  1.00000000

If you would like to combine the matrix with some visualisations I can recommend (I am using the built in iris dataset):

library(psych)
pairs.panels(iris[1:4])  # select columns 1-4

enter image description here

The Performance Analytics basically does the same but includes significance indicators by default.

library(PerformanceAnalytics)
chart.Correlation(iris[1:4])

Correlation Chart

Or this nice and simple visualisation:

library(corrplot)
x <- cor(iris[1:4])
corrplot(x, type="upper", order="hclust")

corrplot


See corr.test function in psych package:

> corr.test(mtcars[1:4])
Call:corr.test(x = mtcars[1:4])
Correlation matrix 
       mpg   cyl  disp    hp
mpg   1.00 -0.85 -0.85 -0.78
cyl  -0.85  1.00  0.90  0.83
disp -0.85  0.90  1.00  0.79
hp   -0.78  0.83  0.79  1.00
Sample Size 
     mpg cyl disp hp
mpg   32  32   32 32
cyl   32  32   32 32
disp  32  32   32 32
hp    32  32   32 32
Probability value 
     mpg cyl disp hp
mpg    0   0    0  0
cyl    0   0    0  0
disp   0   0    0  0
hp     0   0    0  0

And yet another shameless self-advert: https://gist.github.com/887249


You might want to look at Quick-R, which has a lot of nice little tutorials on how you can do basic statistics in R. For example on correlations:

http://www.statmethods.net/stats/correlations.html


You can also calculate correlations for all variables but exclude selected ones, for example:

mtcars <- data.frame(mtcars)
# here we exclude gear and carb variables
cors <- cor(subset(mtcars, select = c(-gear,-carb)))

Also, to calculate correlation between each variable and one column you can use sapply()

# sapply effectively calls the corelation function for each column of mtcars and mtcars$mpg
cors2 <- sapply(mtcars, cor, y=mtcars$mpg)