Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I tell if R is still estimating my SVM model or has crashed?

Tags:

r

svm

I am using the library e1071. In particular, I'm using the svm function. My dataset has 270 fields and 800,000 rows. I've been running this program for 24+ hours now, and I have no idea if it's hung or still running properly. The command I issued was:

svmmodel <- svm(V260 ~ ., data=traindata);

I'm using windows, and using the task manager, the status of Rgui.exe is "Not Responding". Did R crash already? Are there any other tips / tricks to better gauge to see what's happening inside R or the SVM learning process?

If it helps, here are some additional things I noticed using resource monitor (in windows):

  • CPU usage is at 13% (stable)
  • Number of threads is at 3 (stable)
  • Memory usage is at 10,505.9 MB +/- 1 MB (fluctuates)

As I'm writing this thread, I also see "similar questions" and am clicking on them. It seems that SVM training is quadratic or cubic. But still, after 24+ hours, if it's reasonable to wait, I will wait, but if not, I will have to eliminate SVM as a viable predictive model.

like image 887
Jane Wayne Avatar asked Mar 31 '14 20:03

Jane Wayne


2 Answers

As mentioned in the answer to this question, "SVM training can be arbitrary long" depending on the parameters selected.

If I remember correctly from my ML class, running time is roughly proportional to the square of the number training examples, so for 800k examples you probably do not want to wait.

Also, as an anecdote, I once ran e1071 in R for more than two days on a smaller data set than yours. It eventually completed, but the training took too long for my needs.

Keep in mind that most ML algorithms, including SVM, will usually not achieve the desired result out of the box. Therefore, when you are thinking about how fast you need it to run, keep in mind that you will have to pay the running time every time you tweak a tuning parameter. Of course you can reduce this running time by sampling down to a smaller training set, with the understanding that you will be learning from less data.

like image 126
merlin2011 Avatar answered Oct 29 '22 16:10

merlin2011


By default the function "svm" from e1071 uses radial basis kernel which makes svm induction computationally expensive. You might want to consider using a linear kernel (argument kernel="linear") or use a specialized library like LiblineaR built for large datasets. But your dataset is really large and if linear kernel does not do the trick then as suggested by others you can use a subset of your data to generate the model.

like image 20
Krrr Avatar answered Oct 29 '22 16:10

Krrr