I am using KNN to classify handwritten digits. I also now have implemented PCA to reduce the dimensionality. From 256 I went to 200. But I only notice like, ~0.10% loss of information. I deleted 56 dimension. Shouldn't the loss be bigger? Only when I drop to 5 dimensions I get a ~20% loss. Is this normal?
In a PCA-KNN model, the historical data set as input is generated by a sliding window, transformed by PCA to principal components with rich-information, and then input to KNN for prediction.
Principal Component Analysis (PCA) applied to this data identifies the combination of attributes (principal components, or directions in the feature space) that account for the most variance in the data. Here we plot the different samples on the 2 first principal components.
PCA is an unsupervised machine learning algorithm that attempts to reduce the dimensionality (number of features) within a dataset while still retaining as much information as possible.
With the help of KNN algorithms, we can classify a potential voter into various classes like “Will Vote”, “Will not Vote”, “Will Vote to Party 'Congress', “Will Vote to Party 'BJP'. Other areas in which KNN algorithm can be used are Speech Recognition, Handwriting Detection, Image Recognition and Video Recognition.
You're saying that after removing 56 dimensions, you lost nearly no information? Of course, that's the point of PCA! Principal Component Analysis, as the name states, help you determine which dimensions carry the information. And you can remove the rest, which makes the biggest part of it.
I you want some examples, in gene analysis, I have read papers where the dimension is reduced from 40'000 to 100 with PCA, then they do some magical stuff, and have an excellent classifier with 19 dimensions. This implicitely tells you that they lost virtually no information when they removed 39'900 dimensions!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With