Principal Component Analysis not working

Tags:

I'm trying to do principal component analysis on datasets containing images, but whenever I want to apply pca.transform from the sklearn.decomposition module I keep getting this error: *AttributeError: 'PCA' object has no attribute 'mean_'*. I know what this error means, but I have no clue how to fix it. I reckon some of you guys know how to fix this.

Thank you for your help

My code:

from sklearn import svm
import numpy as np
import glob
import os
from PIL import Image
from sklearn.decomposition import PCA

image_dir1 = "C:\Users\private\Desktop\K FOLDER\private\train"
image_dir2 = "C:\Users\private\Desktop\K FOLDER\private\test1"
Standard_size = (300,200)
pca = PCA(n_components = 10)
file_open = lambda x,y: glob.glob(os.path.join(x,y))


def matrix_image(image_path):
    "opens image and converts it to a m*n matrix" 
    image = Image.open(image_path)
    print("changing size from %s to %s" % (str(image.size), str(Standard_size)))
    image = image.resize(Standard_size)
    image = list(image.getdata())
    image = map(list,image)
    image = np.array(image)
    return image
def flatten_image(image):  
    """
    takes in a n*m numpy array and flattens it to 
    an array of the size (1,m*n)
    """
    s = image.shape[0] * image.shape[1]
    image_wide = image.reshape(1,s)
    return image_wide[0]

if __name__ == "__main__":
    train_images = file_open(image_dir1,"*.jpg")
    test_images = file_open(image_dir2,"*.jpg")
    train_set = []
    test_set = []

    "Loop over all images in files and modify them"
    train_set = [flatten_image(matrix_image(image)) for image in train_images]
    test_set = [flatten_image(matrix_image(image)) for image in test_images]
    train_set = np.array(train_set)
    test_set = np.array(test_set)
    train_set = pca.fit_transform(train_set) "line where error occurs"
    test_set = pca.fit_transform(test_set)

Full traceback:

Traceback (most recent call last):
  File "C:\Users\Private\workspace\final_submission\src\d.py", line 54, in <module>
    train_set = pca.transform(train_set)
  File "C:\Python27\lib\site-packages\sklearn\decomposition\pca.py", line 298, in transform
    if self.mean_ is not None:
AttributeError: 'PCA' object has no attribute 'mean_'

Edit1: So I tried to fit the model before transforming it, and now I'm getting an even weirder error. I looked it up, and it involves f2py, a module that ports Fortran to Python which is part of the Numpy Library.

File "C:\Users\Private\workspace\final_submission\src\d.py", line 54, in <module>
    pca.fit(train_set)
  File "C:\Python27\lib\site-packages\sklearn\decomposition\pca.py", line 200, in fit
    self._fit(X)
  File "C:\Python27\lib\site-packages\sklearn\decomposition\pca.py", line 249, in _fit
    U, S, V = linalg.svd(X, full_matrices=False)
  File "C:\Python27\lib\site-packages\scipy\linalg\decomp_svd.py", line 100, in svd
    full_matrices=full_matrices, overwrite_a = overwrite_a)
ValueError: failed to create intent(cache|hide)|optional array-- must have defined dimensions but got (0,)

Edit2:

So I have checked if my train_set and data_set contained any data and they don't. I've checked my image_dirs, and they contain the right locations(just for clarity, I got them by going to the actual files, looking at the properties of one the images and copied the location). The fault should lie somewhere else.

629

asked Oct 16 '13 17:10

Learner

1 Answers

You should fit the model before transform:

train_set = np.array(train_set)
test_set = np.array(test_set)

pca.fit(train_set)
pca.fit(test_set)

train_set = pca.transform(train_set) "line where error occurs"
test_set = pca.transform(test_set)

Edit

Second error indicate that your train_set is empty. It can be easily reproduced using this code:

train_set = np.array([[]])
pca.fit(train_set)

I think one problem is in flatten_image function. I may be wrong but this line will raise AttributeError

image.wide = image.reshape(1,s)

It can be replaced with:

image_wide = image.reshape(1,s)
return image_wide[0]

This line is problematic too:

print("changing size from %s to %s" % str(image.size), str(Standard_size))

Read http://docs.python.org/2/library/stdtypes.html#string-formatting-operations for more details, but values must be a tuple. So you want this instead:

print("changing size from %s to %s" % (str(image.size), str(Standard_size)))

Another edit

At last you replace loops aftert "Loop over all images in files and modify them" with:

train_set = [flatten_image(matrix_image(image)) for image in train_images]
test_set = [flatten_image(matrix_image(image)) for image in test_images]

Right now you call file_open so it will look for files in path like this: "C:\Users\private\Desktop\K FOLDER\private\train\C:\Users\private\Desktop\K FOLDER\private\train\foo.jpg" and you get empty list instead of file name.

138

answered Sep 22 '22 03:09

zero323

Related questions
                            
                                Passing unknown args and kwargs to function in python
                            
                                Setting the text colour of a tooltip in PyQt
                            
                                Function that returns a tuple gives TypeError: 'NoneType' object is not iterable
                            
                                Python urllib2 force IPv4
                            
                                Pickling multiple dictionaries
                            
                                python pandas operations on columns
                            
                                How to check if PostgreSQL schema exists using SQLAlchemy?
                            
                                Django - how to filter using QuerySet to get subset of objects?
                            
                                Produce interleaving bit patterns (morton keys) for 32 bit , 64 bit and 128bit
                            
                                Differentiating a product with an unknown function - sympy
                            
                                Python parallel threads
                            
                                comparison of list using cmp or ==
                            
                                Implementing fancy indexing in a class
                            
                                Writing .npy (numpy binary format) from java
                            
                                How to write a function that takes a positive integer N and returns a list of the first N natural numbers
                            
                                Parse and format the date from the GitHub API in Python [duplicate]
                            
                                How to validate integer range in Flask routing (Werkzeug)?
                            
                                How to download big file in python via ftp (with monitoring & reconnect)?
                            
                                Hashing file in Python 3?
                            
                                "Must construct a QApplication before a QPaintDevice" from QWidget

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Principal Component Analysis not working

Tags:

python

machine-learning

numpy

scipy

scikit-learn

Learner

People also ask

1 Answers

zero323

Recent Activity

Donate For Us