Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pickle error: UnicodeDecodeError

I'm trying to do some text classification using Textblob. I'm first training the model and serializing it using pickle as shown below.

import pickle from textblob.classifiers import NaiveBayesClassifier  with open('sample.csv', 'r') as fp:      cl = NaiveBayesClassifier(fp, format="csv")  f = open('sample_classifier.pickle', 'wb') pickle.dump(cl, f) f.close() 

And when I try to run this file:

import pickle f = open('sample_classifier.pickle', encoding="utf8") cl = pickle.load(f)     f.close() 

I get this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Following are the content of my sample.csv:

My SQL is not working correctly at all. This was a wrong choice, SQL

I've issues. Please respond immediately, Support

Where am I going wrong here? Please help.

like image 242
90abyss Avatar asked Oct 05 '15 20:10

90abyss


People also ask

What encoding does pickle use?

Using encoding='latin1' is required for unpickling NumPy arrays and instances of datetime , date and time pickled by Python 2. If buffers is None (the default), then all data necessary for deserialization must be contained in the pickle stream.

What is difference between pickling and unpickling?

Pickling: It is a process where a Python object hierarchy is converted into a byte stream. Unpickling: It is the inverse of Pickling process where a byte stream is converted into an object hierarchy.

Are pickle files compressed?

By default, the pickle data format uses a relatively compact binary representation. If you need optimal size characteristics, you can efficiently compress pickled data.

How to pickle Python 3?

First, import pickle to use it, then we define an example dictionary, which is a Python object. Next, we open a file (note that we open to write bytes in Python 3+), then we use pickle. dump() to put the dict into opened file, then close. Use pickle.


1 Answers

By choosing to open the file in mode wb, you are choosing to write in raw binary. There is no character encoding being applied.

Thus to read this file, you should simply open in mode rb.

like image 66
donkopotamus Avatar answered Sep 23 '22 19:09

donkopotamus