Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Principal Component Analysis not working

I'm trying to do principal component analysis on datasets containing images, but whenever I want to apply pca.transform from the sklearn.decomposition module I keep getting this error: *AttributeError: 'PCA' object has no attribute 'mean_'*. I know what this error means, but I have no clue how to fix it. I reckon some of you guys know how to fix this.

Thank you for your help

My code:

from sklearn import svm
import numpy as np
import glob
import os
from PIL import Image
from sklearn.decomposition import PCA

image_dir1 = "C:\Users\private\Desktop\K FOLDER\private\train"
image_dir2 = "C:\Users\private\Desktop\K FOLDER\private\test1"
Standard_size = (300,200)
pca = PCA(n_components = 10)
file_open = lambda x,y: glob.glob(os.path.join(x,y))


def matrix_image(image_path):
    "opens image and converts it to a m*n matrix" 
    image = Image.open(image_path)
    print("changing size from %s to %s" % (str(image.size), str(Standard_size)))
    image = image.resize(Standard_size)
    image = list(image.getdata())
    image = map(list,image)
    image = np.array(image)
    return image
def flatten_image(image):  
    """
    takes in a n*m numpy array and flattens it to 
    an array of the size (1,m*n)
    """
    s = image.shape[0] * image.shape[1]
    image_wide = image.reshape(1,s)
    return image_wide[0]

if __name__ == "__main__":
    train_images = file_open(image_dir1,"*.jpg")
    test_images = file_open(image_dir2,"*.jpg")
    train_set = []
    test_set = []

    "Loop over all images in files and modify them"
    train_set = [flatten_image(matrix_image(image)) for image in train_images]
    test_set = [flatten_image(matrix_image(image)) for image in test_images]
    train_set = np.array(train_set)
    test_set = np.array(test_set)
    train_set = pca.fit_transform(train_set) "line where error occurs"
    test_set = pca.fit_transform(test_set)

Full traceback:

Traceback (most recent call last):
  File "C:\Users\Private\workspace\final_submission\src\d.py", line 54, in <module>
    train_set = pca.transform(train_set)
  File "C:\Python27\lib\site-packages\sklearn\decomposition\pca.py", line 298, in transform
    if self.mean_ is not None:
AttributeError: 'PCA' object has no attribute 'mean_'

Edit1: So I tried to fit the model before transforming it, and now I'm getting an even weirder error. I looked it up, and it involves f2py, a module that ports Fortran to Python which is part of the Numpy Library.

File "C:\Users\Private\workspace\final_submission\src\d.py", line 54, in <module>
    pca.fit(train_set)
  File "C:\Python27\lib\site-packages\sklearn\decomposition\pca.py", line 200, in fit
    self._fit(X)
  File "C:\Python27\lib\site-packages\sklearn\decomposition\pca.py", line 249, in _fit
    U, S, V = linalg.svd(X, full_matrices=False)
  File "C:\Python27\lib\site-packages\scipy\linalg\decomp_svd.py", line 100, in svd
    full_matrices=full_matrices, overwrite_a = overwrite_a)
ValueError: failed to create intent(cache|hide)|optional array-- must have defined dimensions but got (0,)

Edit2:

So I have checked if my train_set and data_set contained any data and they don't. I've checked my image_dirs, and they contain the right locations(just for clarity, I got them by going to the actual files, looking at the properties of one the images and copied the location). The fault should lie somewhere else.

like image 629
Learner Avatar asked Oct 16 '13 17:10

Learner


People also ask

Why is PCA not working?

PCA should be used mainly for variables which are strongly correlated. If the relationship is weak between variables, PCA does not work well to reduce data. Refer to the correlation matrix to determine. In general, if most of the correlation coefficients are smaller than 0.3, PCA will not help.

Is Principal Component Analysis effective?

PCA is popular because it can effectively find an optimal representation of a data set with fewer dimensions. It is effective at filtering noise and decreasing redundancy.

Can PCA fail?

When a given data set is not linearly distributed but might be arranged along with non-orthogonal axes or well described by a geometric parameter, PCA could fail to represent and recover original data from projected variables.

What is the disadvantage of principal component analysis?

Low interpretability of principal components. Principal components are linear combinations of the features from the original data, but they are not as easy to interpret. For example, it is difficult to tell which are the most important features in the dataset after computing principal components.


1 Answers

You should fit the model before transform:

train_set = np.array(train_set)
test_set = np.array(test_set)

pca.fit(train_set)
pca.fit(test_set)

train_set = pca.transform(train_set) "line where error occurs"
test_set = pca.transform(test_set)

Edit

Second error indicate that your train_set is empty. It can be easily reproduced using this code:

train_set = np.array([[]])
pca.fit(train_set)

I think one problem is in flatten_image function. I may be wrong but this line will raise AttributeError

image.wide = image.reshape(1,s)

It can be replaced with:

image_wide = image.reshape(1,s)
return image_wide[0]

This line is problematic too:

print("changing size from %s to %s" % str(image.size), str(Standard_size))

Read http://docs.python.org/2/library/stdtypes.html#string-formatting-operations for more details, but values must be a tuple. So you want this instead:

print("changing size from %s to %s" % (str(image.size), str(Standard_size)))

Another edit

At last you replace loops aftert "Loop over all images in files and modify them" with:

train_set = [flatten_image(matrix_image(image)) for image in train_images]
test_set = [flatten_image(matrix_image(image)) for image in test_images]

Right now you call file_open so it will look for files in path like this: "C:\Users\private\Desktop\K FOLDER\private\train\C:\Users\private\Desktop\K FOLDER\private\train\foo.jpg" and you get empty list instead of file name.

like image 138
zero323 Avatar answered Sep 22 '22 03:09

zero323