Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Random Forest Classifier Segmentation Fault

been trying to run the RF classifier on a data set of ~50,000 entries with 20 or so labels which I thought should be fine but I keep coming across the following when trying to fit...

Exception MemoryError: MemoryError() in 'sklearn.tree._tree.Tree._resize' ignored
Segmentation fault (core dumped)

The data set has been passed through the TfidfVectorizer and then TruncatedSVD with n=100 for dimensionality reduction. RandomForestClassifier is running with n_jobs=1 and n_estimators=10 in an attempt to get find the minimum point at which it will work. The system is running with 4GB of RAM and RF has worked in the past on a similar data set with much higher numbers of estimators etc. Scikit-learn is running at the current version 0.14.1.

Any tips?

Thanks

like image 688
Carlos Sultana Avatar asked Nov 27 '13 01:11

Carlos Sultana


1 Answers

Segfaults are always bugs. If a malloc fails inside RandomForest then it should be caught, and it is my best guess that this is what is happening to you. As a commenter already said, you should report this to the RandomForest bug tracker. But the malloc is probably failing because of an out of memory condition, so reduce your dimensionality, reduce your training data set size, get more memory, or run on a system with more memory.

like image 62
IanSR Avatar answered Oct 17 '22 22:10

IanSR