Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: How to get StringIO.writelines to accept unicode string?

Tags:

I'm getting a

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 34: ordinal not in range(128) 

on a string stored in 'a.desc' below as it contains the '£' character. It's stored in the underlying Google App Engine datastore as a unicode string so that's fine. The cStringIO.StringIO.writelines function is trying seemingly trying to encode it in ascii format:

result.writelines(['blahblah',a.desc,'blahblahblah']) 

How do I instruct it to treat the encoding as unicode if that's the correct phrasing?

app engine runs on python 2.5

like image 967
blippy Avatar asked Nov 30 '09 03:11

blippy


People also ask

How do I add a Unicode to a string in Python?

To include Unicode characters in your Python source code, you can use Unicode escape characters in the form \u0123 in your string. In Python 2. x, you also need to prefix the string literal with 'u'.

Does Python accept Unicode?

Python's string type uses the Unicode Standard for representing characters, which lets Python programs work with all these different possible characters. Unicode (https://www.unicode.org/) is a specification that aims to list every character used by human languages and give each character its own unique code.

Are Python Unicode strings default?

The distinction between bytes and Unicode strings is important because strings in Python are Unicode by default. However, external hardware like Arduino's, oscilloscopes and voltmeters transmit characters as bytes.


1 Answers

You can wrap the StringIO object in a codecs.StreamReaderWriter object to automatically encode and decode unicode.

Like this:

import cStringIO, codecs buffer = cStringIO.StringIO() codecinfo = codecs.lookup("utf8") wrapper = codecs.StreamReaderWriter(buffer,          codecinfo.streamreader, codecinfo.streamwriter)  wrapper.writelines([u"list of", u"unicode strings"]) 

buffer will be filled with utf-8 encoded bytes.

If I understand your case correctly, you will only need to write, so you could also do:

import cStringIO, codecs buffer = cStringIO.StringIO() wrapper = codecs.getwriter("utf8")(buffer) 
like image 164
codeape Avatar answered Feb 14 '23 21:02

codeape