Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Saving scikit-learn classifier causes memory error

My machine has 16G RAM and the training program uses memory up to 2.6G. But when I want to save the classifier (trained using sklearn.svm.SVC from a large dataset) as pickle file, it consumes too much memory that my machine cannot give. Eager to know any alternative approaches to save an classifier.

I've tried:

  • pickle and cPickle
  • Dump as w or wb
  • Set fast = True

Neither of them work, always raise a MemoryError. Occasionally the file was saved, but loading it causes ValueError: insecure string pickle.

Thank you in advance!

Update

Thank you all. I didn't try joblib, it works after setting protocol=2.

like image 795
iceboal Avatar asked Nov 01 '22 00:11

iceboal


1 Answers

I would suggest to use out-of-core classifiers from sci-kit learn. These are batch learning algorithms, stores the model output as compressed sparse matrix and are very time efficient.

To start with, the following link really helped me.

http://scikit-learn.org/stable/auto_examples/applications/plot_out_of_core_classification.html

like image 129
Pulkit Jha Avatar answered Nov 04 '22 07:11

Pulkit Jha