'ascii' codec can't encode character at position * ord not in range(128)

Question

There are a few threads on stackoverflow, but i couldn't find a valid solution to the problem as a whole.

I have collected huge sums of textual data from the urllib read function and stored the same in pickle files.

Now I want to write this data to a file. While writing i'm getting errors similar to -

'ascii' codec can't encode character u'\u2019' in position 16: ordinal not in range(128)

and a lot of data is being lost.

I suppose the data off the urllib read is byte data

I've tried

   1. text=text.decode('ascii','ignore')
   2. s=filter(lambda x: x in string.printable, s)
   3. text=u''+text
      text=text.decode().encode('utf-8')

but still im ending up with similar errors. Can somebody point out a proper solution. And also would codecs strip work. I have no issues if the conflict bytes are not written to the file as a string hence the loss is accepted.

Thanasis Petsas · Accepted Answer

You can do it through smart_str of Django module. Just try this:

from django.utils.encoding import smart_str, smart_unicode

text = u'\u2019'
print smart_str(text)

You can install Django by starting a command shell with administrator privileges and run this command:

pip install Django

Martijn Pieters · Answer

Your data is unicode data. To write that to a file, use .encode():

text = text.encode('ascii', 'ignore')

but that would remove anything that isn't ASCII. Perhaps you wanted to encode to a more suitable encoding, like UTF-8, instead?

You may want to read up on Python and Unicode:

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
The Python Unicode HOWTO
Pragmatic Unicode by Ned Batchelder

'ascii' codec can't encode character at position * ord not in range(128)

Tags:

python

encode

unicode

decode

minocha

2 Answers

Thanasis Petsas

Martijn Pieters

Recent Activity

Donate For Us

'ascii' codec can't encode character at position * ord not in range(128)

Tags:

python

encode

unicode

decode

minocha

2 Answers

Thanasis Petsas

Martijn Pieters

Related questions

Recent Activity

Donate For Us