Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use princomp () function in R when covariance matrix has zero's?

While using princomp() function in R, the following error is encountered : "covariance matrix is not non-negative definite".

I think, this is due to some values being zero (actually close to zero, but becomes zero during rounding) in the covariance matrix.

Is there a work around to proceed with PCA when covariance matrix contains zeros ?

[FYI : obtaining the covariance matrix is an intermediate step within the princomp() call. Data file to reproduce this error can be downloaded from here - http://tinyurl.com/6rtxrc3]

like image 355
384X21 Avatar asked Dec 19 '11 13:12

384X21


People also ask

What does Princomp do in R?

princomp is a generic function with "formula" and "default" methods. The calculation is done using eigen on the correlation or covariance matrix, as determined by cor . This is done for compatibility with the S-PLUS result. A preferred method of calculation is to use svd on x , as is done in prcomp .

How do you find the covariance matrix in R?

To create a Covariance matrix from a data frame in the R Language, we use the cov() function. The cov() function forms the variance-covariance matrix. It takes the data frame as an argument and returns the covariance matrix as result.

Can PCA be applied on correlation matrix?

PCA can be based on either the covariance matrix or the correlation matrix. The choice between these analyses will be discussed. In either case, the new variables (the PCs) depend on the dataset, rather than being pre-defined basis functions, and so are adaptive in the broad sense.

Does Princomp scale data?

prcomp can do centering or scaling for you, but it also recognizes when the data passed to it has been previously centered or scaled via the scale function. 2 Internally, prcomp is a wrapper for the svd function (which we'll discuss below).


1 Answers

The first strategy might be to decrease the tolerance argument. Looks to me that princomp won't pass on a tolerance argument but that prcomp does accept a 'tol' argument. If not effective, this should identify vectors which have nearly-zero covariance:

nr0=0.001
which(abs(cov(M)) < nr0, arr.ind=TRUE)

And this would identify vectors with negative eigenvalues:

which(eigen(M)$values < 0)

Using the h9 example on the help(qr) page:

> which(abs(cov(h9)) < .001, arr.ind=TRUE)
      row col
 [1,]   9   4
 [2,]   8   5
 [3,]   9   5
 [4,]   7   6
 [5,]   8   6
 [6,]   9   6
 [7,]   6   7
 [8,]   7   7
 [9,]   8   7
[10,]   9   7
[11,]   5   8
[12,]   6   8
[13,]   7   8
[14,]   8   8
[15,]   9   8
[16,]   4   9
[17,]   5   9
[18,]   6   9
[19,]   7   9
[20,]   8   9
[21,]   9   9
> qr(h9[-9,-9])$rank  
[1] 7                  # rank deficient, at least at the default tolerance
> qr(h9[-(8:9),-(8:9)])$ take out only the vector  with the most dependencies
[1] 6                   #Still rank deficient
> qr(h9[-(7:9),-(7:9)])$rank
[1] 6

Another approach might be to use the alias function:

alias( lm( rnorm(NROW(dfrm)) ~ dfrm) )
like image 77
IRTFM Avatar answered Oct 10 '22 08:10

IRTFM