When using Python 2.7.5 with OpenCV (OSX), I run PCA on a sequence of images (cols are pixels, rows are frames as per this answer.
How do I get the eigenvalues corresponding to the eigenvectors? Looks like it's a property of the PCA object in C++, but the Python equivalent PCACompute()
is a simple function.
Seems strange to omit such a key part of PCA.
matmul.cpp confirms PCA::Operator()
is being used by PCACompute()
, but the eigenvalues are discarded. So I did this:
# The following mimics PCA::operator() implementation from OpenCV's
# matmul.cpp() which is wrapped by Python cv2.PCACompute(). We can't
# use PCACompute() though as it discards the eigenvalues.
# Scrambled is faster for nVariables >> nObservations. Bitmask is 0 and
# therefore default / redundant, but included to abide by online docs.
covar, mean = cv2.calcCovarMatrix(PCAInput, cv2.cv.CV_COVAR_SCALE |
cv2.cv.CV_COVAR_ROWS |
cv2.cv.CV_COVAR_SCRAMBLED)
eVal, eVec = cv2.eigen(covar, computeEigenvectors=True)[1:]
# Conversion + normalisation required due to 'scrambled' mode
eVec = cv2.gemm(eVec, PCAInput - mean, 1, None, 0)
# apply_along_axis() slices 1D rows, but normalize() returns 4x1 vectors
eVec = numpy.apply_along_axis(lambda n: cv2.normalize(n).flat, 1, eVec)
(simplifying assumptions: rows = observations, cols = variables; and there's many more variables than observations. Both are true in my case.)
This pretty much works. In the following, old_eVec
is the result from cv2.PCACompute()
:
In [101]: eVec
Out[101]:
array([[ 3.69396088e-05, 1.66745325e-05, 4.97117583e-05, ...,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
[ -7.23531536e-06, -3.07411122e-06, -9.58259793e-06, ...,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
[ 1.01496237e-05, 4.60048715e-06, 1.33919606e-05, ...,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
...,
[ -1.42024751e-04, 5.21386198e-05, 3.59923394e-04, ...,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
[ -5.28685812e-05, 8.50139472e-05, -3.13278542e-04, ...,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
[ 2.96546917e-04, 1.23437674e-04, 4.98598461e-04, ...,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00]])
In [102]: old_eVec
Out[102]:
array([[ 3.69395821e-05, 1.66745194e-05, 4.97117981e-05, ...,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
[ -7.23533140e-06, -3.07411415e-06, -9.58260534e-06, ...,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
[ 1.01496662e-05, 4.60050160e-06, 1.33920075e-05, ...,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
...,
[ -1.42029530e-04, 5.21366564e-05, 3.60067672e-04, ...,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
[ -5.29163444e-05, 8.50261567e-05, -3.13150231e-04, ...,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
[ -7.13724992e-04, -8.52700090e-04, 1.57953508e-03, ...,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00]], dtype=float32)
There's some kind of loss of precision, visible toward the end of the outputs (though in fact quick plotting of absolute difference reveals no pattern to the imprecision).
57% of elements have a nonzero absolute difference.
Of these, 95% are different by less than 2e-16 and the mean A.D. is 5.3e-4 - however, the A.D. can be as high as 0.059, which is a lot when you consider all eigenvector values lie between -0.048 to 0.045 .
There is code in PCA::Operator()
which converts to the biggest ctype; on the other hand old_eVec
was float32 compared with my own code producing float64. It's worth mentioning that when compiling numpy I got some precision-related errors.
Overall, the loss of precision seems to be related to low eigenvalued eigenvectors which again points to rounding error etc. The above implementation produces results similar to PCACompute(), having duplicated the behaviour.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With