Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Problem with Principal Component Analysis

I'm not sure this is the right place but here I go:

I have a database of 300 picture in high-resolution. I want to compute the PCA on this database and so far here is what I do: - reshape every image as a single column vector - create a matrix of all my data (500x300) - compute the average column and substract it to my matrix, this gives me X - compute the correlation C = X'X (300x300) - find the eigenvectors V and Eigen Values D of C. - the PCA matrix is given by XV*D^-1/2, where each column is a Principal Component

This is great and gives me correct component.

Now what I'm doing is doing the same PCA on the same database, except that the images have a lower resolution.

enter image description here

Here are my results, low-res on the left and high-res on the right. Has you can see most of them are similar but SOME images are not the same (the ones I circled)

Is there any way to explain this? I need for my algorithm to have the same images, but one set in high-res and the other one in low-res, how can I make this happen?

thanks

like image 733
lezebulon Avatar asked Aug 12 '11 15:08

lezebulon


1 Answers

It is very possible that the filter you used could have done a thing or two to some of the components. After all, lower resolution images don't contain higher frequencies that, too, contribute to which components you're going to get. If component weights (lambdas) at those images are small, there's also a good possibility of errors.

I'm guessing your component images are sorted by weight. If they are, I would try to use a different pre-downsampling filter and see if it gives different results (essentially obtain lower resolution images by different means). It is possible that the components that come out differently have lots of frequency content in the transition band of that filter. It looks like images circled with red are nearly perfect inversions of each other. Filters can cause such things.

If your images are not sorted by weight, I wouldn't be surprised if the ones you circled have very little weight and that could simply be a computational precision error or something of that sort. In any case, we would probably need a little more information about how you downsample, how you sort the images before displaying them. Also, I wouldn't expect all images to be extremely similar because you're essentially getting rid of quite a few frequency components. I'm pretty sure it wouldn't have anything to do with the fact that you're stretching out images into vectors to compute PCA, but try to stretch them out in a different direction (take columns instead of rows or vice versa) and try that. If it changes the result, then perhaps you might want to try to perform PCA somewhat differently, not sure how.

like image 101
Phonon Avatar answered Oct 13 '22 00:10

Phonon