While using princomp()
function in R, the following error is encountered : "covariance matrix is not non-negative definite"
.
I think, this is due to some values being zero (actually close to zero, but becomes zero during rounding) in the covariance matrix.
Is there a work around to proceed with PCA when covariance matrix contains zeros ?
[FYI : obtaining the covariance matrix is an intermediate step within the princomp()
call. Data file to reproduce this error can be downloaded from here - http://tinyurl.com/6rtxrc3]
princomp is a generic function with "formula" and "default" methods. The calculation is done using eigen on the correlation or covariance matrix, as determined by cor . This is done for compatibility with the S-PLUS result. A preferred method of calculation is to use svd on x , as is done in prcomp .
To create a Covariance matrix from a data frame in the R Language, we use the cov() function. The cov() function forms the variance-covariance matrix. It takes the data frame as an argument and returns the covariance matrix as result.
PCA can be based on either the covariance matrix or the correlation matrix. The choice between these analyses will be discussed. In either case, the new variables (the PCs) depend on the dataset, rather than being pre-defined basis functions, and so are adaptive in the broad sense.
prcomp can do centering or scaling for you, but it also recognizes when the data passed to it has been previously centered or scaled via the scale function. 2 Internally, prcomp is a wrapper for the svd function (which we'll discuss below).
The first strategy might be to decrease the tolerance argument. Looks to me that princomp
won't pass on a tolerance argument but that prcomp
does accept a 'tol' argument. If not effective, this should identify vectors which have nearly-zero covariance:
nr0=0.001
which(abs(cov(M)) < nr0, arr.ind=TRUE)
And this would identify vectors with negative eigenvalues:
which(eigen(M)$values < 0)
Using the h9 example on the help(qr) page:
> which(abs(cov(h9)) < .001, arr.ind=TRUE)
row col
[1,] 9 4
[2,] 8 5
[3,] 9 5
[4,] 7 6
[5,] 8 6
[6,] 9 6
[7,] 6 7
[8,] 7 7
[9,] 8 7
[10,] 9 7
[11,] 5 8
[12,] 6 8
[13,] 7 8
[14,] 8 8
[15,] 9 8
[16,] 4 9
[17,] 5 9
[18,] 6 9
[19,] 7 9
[20,] 8 9
[21,] 9 9
> qr(h9[-9,-9])$rank
[1] 7 # rank deficient, at least at the default tolerance
> qr(h9[-(8:9),-(8:9)])$ take out only the vector with the most dependencies
[1] 6 #Still rank deficient
> qr(h9[-(7:9),-(7:9)])$rank
[1] 6
Another approach might be to use the alias
function:
alias( lm( rnorm(NROW(dfrm)) ~ dfrm) )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With