Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PCA and KNN algorithm

Tags:

algorithm

pca

knn

I am using KNN to classify handwritten digits. I also now have implemented PCA to reduce the dimensionality. From 256 I went to 200. But I only notice like, ~0.10% loss of information. I deleted 56 dimension. Shouldn't the loss be bigger? Only when I drop to 5 dimensions I get a ~20% loss. Is this normal?

like image 886
Test Test Avatar asked Apr 16 '12 23:04

Test Test


People also ask

Can you use PCA with KNN?

In a PCA-KNN model, the historical data set as input is generated by a sliding window, transformed by PCA to principal components with rich-information, and then input to KNN for prediction.

What is PCA in KNN?

Principal Component Analysis (PCA) applied to this data identifies the combination of attributes (principal components, or directions in the feature space) that account for the most variance in the data. Here we plot the different samples on the 2 first principal components.

What is PCA algorithm used for?

PCA is an unsupervised machine learning algorithm that attempts to reduce the dimensionality (number of features) within a dataset while still retaining as much information as possible.

What is KNN algorithm example?

With the help of KNN algorithms, we can classify a potential voter into various classes like “Will Vote”, “Will not Vote”, “Will Vote to Party 'Congress', “Will Vote to Party 'BJP'. Other areas in which KNN algorithm can be used are Speech Recognition, Handwriting Detection, Image Recognition and Video Recognition.


1 Answers

You're saying that after removing 56 dimensions, you lost nearly no information? Of course, that's the point of PCA! Principal Component Analysis, as the name states, help you determine which dimensions carry the information. And you can remove the rest, which makes the biggest part of it.

I you want some examples, in gene analysis, I have read papers where the dimension is reduced from 40'000 to 100 with PCA, then they do some magical stuff, and have an excellent classifier with 19 dimensions. This implicitely tells you that they lost virtually no information when they removed 39'900 dimensions!

like image 193
B. Decoster Avatar answered Sep 19 '22 04:09

B. Decoster