Multiple Correlation for more than 3 variables Definition 1 defines the multiple correlation coefficient Rz,xy and the corresponding multiple coefficient of determination for three variables x, y, and z. We can extend these definitions to more than three variables as described in Advanced Multiple Correlation.
A correlation is usually tested for two variables at a time, but you can test correlations between three or more variables.
If You need to calculate "correlation" between three or more variables, you could not use Pearson, as in this case it will be different for different order of variables have a look here.
AVariables: The variables to be used in the bivariate Pearson Correlation. You must select at least two continuous variables, but may select more than two. The test will produce correlation coefficients for each pair of variables in this list.
Use the same function (cor
) on a data frame, e.g.:
> cor(VADeaths)
Rural Male Rural Female Urban Male Urban Female
Rural Male 1.0000000 0.9979869 0.9841907 0.9934646
Rural Female 0.9979869 1.0000000 0.9739053 0.9867310
Urban Male 0.9841907 0.9739053 1.0000000 0.9918262
Urban Female 0.9934646 0.9867310 0.9918262 1.0000000
Or, on a data frame also holding discrete variables, (also sometimes referred to as factors), try something like the following:
> cor(mtcars[,unlist(lapply(mtcars, is.numeric))])
mpg cyl disp hp drat wt qsec vs am gear carb
mpg 1.0000000 -0.8521620 -0.8475514 -0.7761684 0.68117191 -0.8676594 0.41868403 0.6640389 0.59983243 0.4802848 -0.55092507
cyl -0.8521620 1.0000000 0.9020329 0.8324475 -0.69993811 0.7824958 -0.59124207 -0.8108118 -0.52260705 -0.4926866 0.52698829
disp -0.8475514 0.9020329 1.0000000 0.7909486 -0.71021393 0.8879799 -0.43369788 -0.7104159 -0.59122704 -0.5555692 0.39497686
hp -0.7761684 0.8324475 0.7909486 1.0000000 -0.44875912 0.6587479 -0.70822339 -0.7230967 -0.24320426 -0.1257043 0.74981247
drat 0.6811719 -0.6999381 -0.7102139 -0.4487591 1.00000000 -0.7124406 0.09120476 0.4402785 0.71271113 0.6996101 -0.09078980
wt -0.8676594 0.7824958 0.8879799 0.6587479 -0.71244065 1.0000000 -0.17471588 -0.5549157 -0.69249526 -0.5832870 0.42760594
qsec 0.4186840 -0.5912421 -0.4336979 -0.7082234 0.09120476 -0.1747159 1.00000000 0.7445354 -0.22986086 -0.2126822 -0.65624923
vs 0.6640389 -0.8108118 -0.7104159 -0.7230967 0.44027846 -0.5549157 0.74453544 1.0000000 0.16834512 0.2060233 -0.56960714
am 0.5998324 -0.5226070 -0.5912270 -0.2432043 0.71271113 -0.6924953 -0.22986086 0.1683451 1.00000000 0.7940588 0.05753435
gear 0.4802848 -0.4926866 -0.5555692 -0.1257043 0.69961013 -0.5832870 -0.21268223 0.2060233 0.79405876 1.0000000 0.27407284
carb -0.5509251 0.5269883 0.3949769 0.7498125 -0.09078980 0.4276059 -0.65624923 -0.5696071 0.05753435 0.2740728 1.00000000
If you would like to combine the matrix with some visualisations I can recommend (I am using the built in iris
dataset):
library(psych)
pairs.panels(iris[1:4]) # select columns 1-4
The Performance Analytics basically does the same but includes significance indicators by default.
library(PerformanceAnalytics)
chart.Correlation(iris[1:4])
Or this nice and simple visualisation:
library(corrplot)
x <- cor(iris[1:4])
corrplot(x, type="upper", order="hclust")
See corr.test
function in psych
package:
> corr.test(mtcars[1:4])
Call:corr.test(x = mtcars[1:4])
Correlation matrix
mpg cyl disp hp
mpg 1.00 -0.85 -0.85 -0.78
cyl -0.85 1.00 0.90 0.83
disp -0.85 0.90 1.00 0.79
hp -0.78 0.83 0.79 1.00
Sample Size
mpg cyl disp hp
mpg 32 32 32 32
cyl 32 32 32 32
disp 32 32 32 32
hp 32 32 32 32
Probability value
mpg cyl disp hp
mpg 0 0 0 0
cyl 0 0 0 0
disp 0 0 0 0
hp 0 0 0 0
And yet another shameless self-advert: https://gist.github.com/887249
You might want to look at Quick-R, which has a lot of nice little tutorials on how you can do basic statistics in R. For example on correlations:
http://www.statmethods.net/stats/correlations.html
You can also calculate correlations for all variables but exclude selected ones, for example:
mtcars <- data.frame(mtcars)
# here we exclude gear and carb variables
cors <- cor(subset(mtcars, select = c(-gear,-carb)))
Also, to calculate correlation between each variable and one column you can use sapply()
# sapply effectively calls the corelation function for each column of mtcars and mtcars$mpg
cors2 <- sapply(mtcars, cor, y=mtcars$mpg)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With