1 Answer. You can use Python's Pickle library to save most of the machine learning model and you can also restore the saved models later using same library.
NLTK (Natural Language Toolkit) provides Naive Bayes classifier to classify text data.
To save:
import pickle
f = open('my_classifier.pickle', 'wb')
pickle.dump(classifier, f)
f.close()
To load later:
import pickle
f = open('my_classifier.pickle', 'rb')
classifier = pickle.load(f)
f.close()
I went thru the same problem, and you cannot save the object since is a ELEFreqDistr NLTK class. Anyhow NLTK is hell slow. Training took 45 mins on a decent set and I decided to implement my own version of the algorithm (run it with pypy or rename it .pyx and install cython). It takes about 3 minutes with the same set and it can simply save data as json (I'll implement pickle which is faster/better).
I started a simple github project, check out the code here
To Retrain the Pickled Classifer :
f = open('originalnaivebayes5k.pickle','rb')
classifier = pickle.load(f)
classifier.train(training_set)
print('Accuracy:',nltk.classify.accuracy(classifier,testing_set)*100)
f.close()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With