Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python OpenCV PCACompute Eigenvalue

When using Python 2.7.5 with OpenCV (OSX), I run PCA on a sequence of images (cols are pixels, rows are frames as per this answer.

How do I get the eigenvalues corresponding to the eigenvectors? Looks like it's a property of the PCA object in C++, but the Python equivalent PCACompute() is a simple function.

Seems strange to omit such a key part of PCA.

like image 879
benxyzzy Avatar asked Mar 24 '14 14:03

benxyzzy


1 Answers

matmul.cpp confirms PCA::Operator() is being used by PCACompute(), but the eigenvalues are discarded. So I did this:

# The following mimics PCA::operator() implementation from OpenCV's
# matmul.cpp() which is wrapped by Python cv2.PCACompute(). We can't
# use PCACompute() though as it discards the eigenvalues.

# Scrambled is faster for nVariables >> nObservations. Bitmask is 0 and
# therefore default / redundant, but included to abide by online docs.
covar, mean = cv2.calcCovarMatrix(PCAInput, cv2.cv.CV_COVAR_SCALE |
                                            cv2.cv.CV_COVAR_ROWS  |
                                            cv2.cv.CV_COVAR_SCRAMBLED)

eVal, eVec = cv2.eigen(covar, computeEigenvectors=True)[1:]

# Conversion + normalisation required due to 'scrambled' mode
eVec = cv2.gemm(eVec, PCAInput - mean, 1, None, 0)
# apply_along_axis() slices 1D rows, but normalize() returns 4x1 vectors
eVec = numpy.apply_along_axis(lambda n: cv2.normalize(n).flat, 1, eVec)

(simplifying assumptions: rows = observations, cols = variables; and there's many more variables than observations. Both are true in my case.)

This pretty much works. In the following, old_eVec is the result from cv2.PCACompute():

In [101]: eVec
Out[101]: 
array([[  3.69396088e-05,   1.66745325e-05,   4.97117583e-05, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       [ -7.23531536e-06,  -3.07411122e-06,  -9.58259793e-06, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       [  1.01496237e-05,   4.60048715e-06,   1.33919606e-05, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       ..., 
       [ -1.42024751e-04,   5.21386198e-05,   3.59923394e-04, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       [ -5.28685812e-05,   8.50139472e-05,  -3.13278542e-04, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       [  2.96546917e-04,   1.23437674e-04,   4.98598461e-04, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00]])

In [102]: old_eVec
Out[102]: 
array([[  3.69395821e-05,   1.66745194e-05,   4.97117981e-05, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       [ -7.23533140e-06,  -3.07411415e-06,  -9.58260534e-06, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       [  1.01496662e-05,   4.60050160e-06,   1.33920075e-05, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       ..., 
       [ -1.42029530e-04,   5.21366564e-05,   3.60067672e-04, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       [ -5.29163444e-05,   8.50261567e-05,  -3.13150231e-04, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       [ -7.13724992e-04,  -8.52700090e-04,   1.57953508e-03, ...,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00]], dtype=float32)

There's some kind of loss of precision, visible toward the end of the outputs (though in fact quick plotting of absolute difference reveals no pattern to the imprecision).

57% of elements have a nonzero absolute difference.
Of these, 95% are different by less than 2e-16 and the mean A.D. is 5.3e-4 - however, the A.D. can be as high as 0.059, which is a lot when you consider all eigenvector values lie between -0.048 to 0.045 .

There is code in PCA::Operator() which converts to the biggest ctype; on the other hand old_eVec was float32 compared with my own code producing float64. It's worth mentioning that when compiling numpy I got some precision-related errors.

Overall, the loss of precision seems to be related to low eigenvalued eigenvectors which again points to rounding error etc. The above implementation produces results similar to PCACompute(), having duplicated the behaviour.

like image 99
benxyzzy Avatar answered Sep 28 '22 13:09

benxyzzy