I'm working with scikit-learn on building some predictive models with SVMs. I have a dataset with around 5000 examples and about 700 features. I'm 5 fold cross validating with a 18x17 grid search on my training set then using the optimal parameters for my test set. the runs are taking a lot longer than I expected and I have noticed the following:
1) Some individual SVM training iterations seem to take only a minute, while others can take up to 15 minutes. Is this expected with different data and parameters (C and gamma, I'm using rbf
kernel)?
2) I'm trying to use 64 bit python on windows to take advantage of the extra memory, but all my python processes seem to top at 1 gig in my task manager, I don't know if that has anything to do with the runtime.
3) I was using 32bit before and running on about the same dataset, and i remember (though I didn't save down the results) it being quite a bit faster. I used a third party build of scikit-learn for 64 bit windows, so I don't know if it's better to try this on 32 bit python? (source http://www.lfd.uci.edu/~gohlke/pythonlibs/)
Any suggestions on how I can reduce runtime would be greatly appreciated. I guess reducing the search space of my grid search will help but as I'm unsure of even the range of optimal parameters, I'd like to keep it as large as i can. If there are faster SVM implementations as well, please let me know, and I may try those.
Addendum: I went back and tried running the 32bit version again. It's much faster for some reason. It took about 3 hours to get to where the 64bit version got to in 16 hours. Why would there be such a difference?
1) This is expected: small gamma and small regularization will select more support vectors hence the model will be more complex and longer to fit.
2) There is a cache_size
argument that will be passed to the underlying libsvm library. However depending on your data, libsvm might or might not use all of the available cache.
3) No idea. I you run more timed experiments on both platforms please report your findings on the project mailing lists. This might deserve further investigation.
First check that you normalized your features (e.g. remove the mean and scale feature by variances if your data is a dense numpy array). For sparse data, just scale the features (or use a TF-IDF transform for text data for instance). See the preprocessing section of the doc.
Then you should start with a coarse grid (with large logarithmic steps), let say a 3x3 grid and then focus on the interesting areas by rerunning a 3x3 grid on that area. In general the C x gamma SVM params grid is quite smooth.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With