Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Pearson correlation rcorr(x,y) [x=matrix, y=vector] ignores y

Tags:

r

correlation

I have a matrix x (30x2000) of 2000 gene expressions in different cell lines and a vector y (30x1) of a continuous variable outcome. I want to calculate Pearson correlation between each gene and the outcome, so, I expect a 2000x1 vector of r-values. I've used rcorr(x,y) but the result is a 2000x2000 matrix, so I guess it's ignoring the y and calculating all genes against all (the manual says:

x = a numeric matrix with at least 5 rows and at least 2 columns (if y is absent)

But can I have more than one column and have y too? Do I have to use a different function?

like image 296
PGreen Avatar asked Sep 25 '13 09:09

PGreen


People also ask

Why is COR () returning na?

The NA can actually be due to 2 reasons. One is that there is a NA in your data. Another one is due to there being one of the values being constant. This results in standard deviation being equal to zero and hence the cor function returns NA.

How do you correlate multiple variables in R?

In this method, the user has to call the cor() function and then within this function the user has to pass the name of the multiple variables in the form of vector as its parameter to get the correlation among multiple variables by specifying multiple column names in the R programming language.

Which process is based on a matrix of correlations between the variables?

In multiple linear regression, the correlation matrix determines the correlation coefficients between the independent variables in a model.


2 Answers

Using the function cor will work. In general, if x is MxN andy y is MxP, then cor(x,y) will be an NxP matrix where the entry (i,j) is the correlation between x[,i] and y[,j].

Building on SimonO101's reproducible example:

> set.seed(1)
> x <- matrix( runif(12) , nrow = 3 )
> y <- runif(3)
> cor(x,y)
           [,1]
[1,]  0.3712437
[2,]  0.9764443
[3,]  0.2249998
[4,] -0.4903723

If you want just a vector and not a matrix:

> array(cor(x,y))
[1]  0.3712437  0.9764443  0.2249998 -0.4903723
like image 53
mrip Avatar answered Sep 27 '22 22:09

mrip


You need to apply the cor function across the columns of your x matrix...

apply( x , 2 , cor , y = y )

A reproducible example

#  For reproducible data
set.seed(1)

#  3 x 4 matrix
x <- matrix( runif(12) , nrow = 3 )
#          [,1]      [,2]      [,3]       [,4]
#[1,] 0.2655087 0.9082078 0.9446753 0.06178627
#[2,] 0.3721239 0.2016819 0.6607978 0.20597457
#[3,] 0.5728534 0.8983897 0.6291140 0.17655675

# Length 3 vector
y <- runif(3)
#[1] 0.6870228 0.3841037 0.7698414

# Length 4 otuput vector
apply( x , 2 , cor , y = y )
#[1]  0.3712437  0.9764443  0.2249998 -0.4903723
like image 34
Simon O'Hanlon Avatar answered Sep 27 '22 22:09

Simon O'Hanlon