I'm trying to do some text classification using Textblob. I'm first training the model and serializing it using pickle as shown below.
import pickle from textblob.classifiers import NaiveBayesClassifier with open('sample.csv', 'r') as fp: cl = NaiveBayesClassifier(fp, format="csv") f = open('sample_classifier.pickle', 'wb') pickle.dump(cl, f) f.close()
And when I try to run this file:
import pickle f = open('sample_classifier.pickle', encoding="utf8") cl = pickle.load(f) f.close()
I get this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
Following are the content of my sample.csv:
My SQL is not working correctly at all. This was a wrong choice, SQL
I've issues. Please respond immediately, Support
Where am I going wrong here? Please help.
Using encoding='latin1' is required for unpickling NumPy arrays and instances of datetime , date and time pickled by Python 2. If buffers is None (the default), then all data necessary for deserialization must be contained in the pickle stream.
Pickling: It is a process where a Python object hierarchy is converted into a byte stream. Unpickling: It is the inverse of Pickling process where a byte stream is converted into an object hierarchy.
By default, the pickle data format uses a relatively compact binary representation. If you need optimal size characteristics, you can efficiently compress pickled data.
First, import pickle to use it, then we define an example dictionary, which is a Python object. Next, we open a file (note that we open to write bytes in Python 3+), then we use pickle. dump() to put the dict into opened file, then close. Use pickle.
By choosing to open
the file in mode wb
, you are choosing to write in raw binary. There is no character encoding being applied.
Thus to read this file, you should simply open
in mode rb
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With