I want to train my svm classifier for image categorization with scikit-learn.
And I want to use opencv-python's SIFT algorithm function to extract image feature.The situation is as follow:
1. what the scikit-learn's input of svm classifier is a 2-d array, which means each row represent one image,and feature amount of each image is the same;here
2. opencv-python's SIFT algorithm returns a list of keypoints which is a numpy array of shape . here
So my question is:
How could I deal with the SIFT features to fit SVM classifier's input? Can you help me ?
Thanks for pyan's advice, I've adapt my proposal as follow:
1. get SIFT feature vectors from each image
2. perform k-means clustering over all the vectors
3. create feature dictionary, a.k.a. cookbook, based on cluster center
4. re-represent each image based on the feature dictionary, of course dimention amount of each image is the same
5. train my SVM classifier and evaluate it
I've gathered all image SIFT feature vectors into an array(x * 128),which is so large, and then I need to perform clustering on it.
The problem is:
If I use k-means , parameter cluster number has to be set, and I don't know how can I set the best value; if I do not use k-means, which algorithm may be suitable for this?note:I want to use scikit-learn to perform clustering
My proposal is :
1. perform dbscan clustering on the vectors, then I can get label_size and labels;
2. because dbscan in scikit-learn can not be used for predicting, I could train a new classifier A based on dbscan result;
3. classifier A is just like a cookbook, I can label every image's SIFT vectors. After that, every image can be re-represented ;
4.based on the above work, I can train my final classifier B.note:for predict a new image, its SIFT vectors must be transform by classifier A into the vector as classifier B's input
Can you give me some advice?
Image classification can be quite general. In order to define good features, first you need to be clear what kind of output you want. For example, images can be categorized according to the scenes in them into nature view, city view, indoor view etc. Different kind of classifications may require different kind of features.
A common approach used in computer vision for keywords based image classification is bag of words (feature bagging) or dictionary learning. You can do a literature search to familiarize yourself on this topic. In your case, the basic idea would be to group the SIFT features into different clusters. Instead of directly feeding scikit-learn
with SIFT features, give the vector of the feature group frequency as input. So each image will be represented by a 1-D vector.
A short introduction from Wikipedia Bag-of-words model in computer vision
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With