Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to save the result of classifier textblob NaiveBayesClassifier?

I am using TextBlob's NaiveBayesclassifier for text analysis according to the given themes that I have chosen.

The data is huge(about 3000 entries).

Though I was able to get a result, I'm not able to save it for future use without calling that function again and waiting hours till the processing gets complete.

I tried pickling by the following method

ab = NaiveBayesClassifier(data)

import pickle

object = ab
file = open('f.obj','w') #tried to use 'a' in place of 'w' ie. append
pickle.dump(object,file)

and I got an error, which is as follows:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\pickle.py", line 1370, in dump
    Pickler(file, protocol).dump(obj)
  File "C:\Python27\lib\pickle.py", line 224, in dump
    self.save(obj)
  File "C:\Python27\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Python27\lib\pickle.py", line 419, in save_reduce
    save(state)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 649, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Python27\lib\pickle.py", line 663, in _batch_setitems
    save(v)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 600, in save_list
    self._batch_appends(iter(obj))
  File "C:\Python27\lib\pickle.py", line 615, in _batch_appends
    save(x)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 562, in save_tuple
    save(element)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 649, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Python27\lib\pickle.py", line 662, in _batch_setitems
    save(k)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 501, in save_unicode
    self.memoize(obj)
  File "C:\Python27\lib\pickle.py", line 247, in memoize
    self.memo[id(obj)] = memo_len, obj
MemoryError

I also tried with sPickle but it also resulted in errors such as:

#saving object with function sPickle.s_dump
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\sPickle.py", line 22, in s_dump
    for elt in iterable_to_pickle:
TypeError: 'NaiveBayesClassifier' object is not iterable

#saving object with function sPickle.s_dump_elt
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\sPickle.py", line 28, in s_dump_elt
    pickled_elt_str = dumps(elt_to_pickle)
MemoryError: out of memory

Can anyone tell me what I have to do to save the object?

Or is there anyway by which is save the results of the classifier for future use?

like image 531
Prashant Shrivastava Avatar asked Mar 20 '23 05:03

Prashant Shrivastava


2 Answers

I solved the problem myself.

first of all Use 64-bit version of Python (for all versions from 2.6 to 3.4)

64-Bit version solves all memory problems

use cPickle

import cPickle as pickle

secondly open ur file as

file = open('file_name.pickle','wb') #same as what Robert said in the above post

to write the object on the file

pickle.dump(object,file)

ur object will be dumped in a file. but u must check what memory is used by your object. pickle-ing takes memory space too so atleast 25% memory shud be available for the object to be pickled

for me, my laptop had a 8 GB RAM so memory was sufficient for only one of the object.

(my classifier was very heavy with 3000 string instances with each string containing sentence of about 15-30 words. The no. of sentiments/themes were 22.)

so if ur laptop deadlocks(or, in general term, stop working) then u might have to power it off and start over again and try using lesser no. of instances OR lesser no. of sentiments/themes.

Here, cPickle is very helpful bcz it is much faster than any other pickle-ing module and i wud suggest using tht.

like image 113
Prashant Shrivastava Avatar answered Apr 27 '23 00:04

Prashant Shrivastava


You need to use "wb" for binary format:

file = open('f.obj','wb')
like image 21
Robert Ekendahl Avatar answered Apr 27 '23 00:04

Robert Ekendahl