Basically I just want to be able to create instances using a class called Bottle: eg class Bottle(object):...
and then in another module be able to simply "print" any instance without having to hack code to explicitly call a character encoding routine.
In summary, when I try:
obj=Bottle(u"味精")
print obj
Or to an "in place" "print":
print Bottle(u"味精")
I get:
"UnicodeEncodeError: 'ascii' codec can't encode characters"
Similar stackoverflow questions:
¢ It's currently not feasible to switch to python3. ¢
A solution or hint (and explanation) on how to do an in place utf-8 print (just like class U does successfully below) would be muchly appreciated. :-)
ThanX N
--
Sample code:
-------- 8>< - - - - cut here - - - -
#!/usr/bin/env python
# -*- coding: utf-8 -*-
def setdefaultencoding(encoding="utf-8"):
import sys, codecs
org_encoding = sys.getdefaultencoding()
if org_encoding == "ascii": # not good enough
print "encoding set to "+encoding
sys.stdout = codecs.getwriter(encoding)(sys.stdout)
sys.stderr = codecs.getwriter(encoding)(sys.stderr)
setdefaultencoding()
msg=u"味精" # the message!
class U(unicode): pass
m1=U(msg)
print "A)", m1 # works fine, even with unicode, but
class Bottle(object):
def __init__(self,msg): self.msg=msg
def __repr__(self):
print "debug: __repr__",self.msg
return '{{{'+self.msg+'}}}'
def __unicode__(self):
print "debug: __unicode__",self.msg
return '{{{'+self.msg+'}}}'
def __str__(self):
print "debug: __str__",self.msg
return '{{{'+self.msg+'}}}'
def decode(self,arg): print "debug: decode",self.msg
def encode(self,arg): print "debug: encode",self.msg
def translate(self,arg): print "debug: translate",self.msg
m2=Bottle(msg)
#print "B)", str(m2)
print "C) repr(x):", repr(m2)
print "D) unicode(x):", unicode(m2)
print "E)",m2 # gives: UnicodeEncodeError: 'ascii' codec can't encode characters
-------- 8>< - - - - cut here - - - - Python 2.4 output:
encoding set to utf-8
A) 味精
C) repr(x): debug: __repr__ 味精
{{{\u5473\u7cbe}}}
D) unicode(x): debug: __unicode__ 味精
{{{味精}}}
E) debug: __str__ 味精
Traceback (most recent call last):
File "./uc.py", line 43, in ?
print "E)",m2 # gives: UnicodeEncodeError: 'ascii' codec can't encode characters
UnicodeEncodeError: 'ascii' codec can't encode characters in position 3-4: ordinal not in range(128)
-------- 8>< - - - - cut here - - - - Python 2.6 output:
encoding set to utf-8
A) 味精
C) repr(x): debug: __repr__ 味精
Traceback (most recent call last):
File "./uc.py", line 41, in <module>
print "C) repr(x):", repr(m2)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 3-4: ordinal not in range(128)
UTF-8 is a byte oriented encoding. The encoding specifies that each character is represented by a specific sequence of one or more bytes.
You have two options to create Unicode string in Python. Either use decode() , or create a new Unicode string with UTF-8 encoding by unicode(). The unicode() method is unicode(string[, encoding, errors]) , its arguments should be 8-bit strings.
In Python 2, a string is by default a binary string and you need to use u'' to mark a string as a Unicode string. However, in Python 3, a string by default is a Unicode string, and you need to use b'' to explicitly mark a string as a binary string.
if you use sys.stdout = codecs.getwriter(encoding)(sys.stdout)
then you should pass Unicode strings to print
:
>>> print u"%s" % Bottle(u"魯賓遜漂流記")
debug: __unicode__ 魯賓遜漂流記
{{{魯賓遜漂流記}}}
As @bobince points out in the comments: avoid changing sys.stdout
in such manner otherwise it might break any library code that works with sys.stdout
and doesn't expect to print Unicode strings.
In general:
__unicode__()
should return Unicode strings:
def __init__(self, msg, encoding='utf-8'):
if not isinstance(msg, unicode):
msg = msg.decode(encoding)
self.msg = msg
def __unicode__(self):
return u"{{{%s}}}" % self.msg
__repr__()
should return ascii-friendly str
object:
def __repr__(self):
return "Bottle(%r)" % self.msg
__str__()
should return str
object. Add optional encoding
to document what encoding is used. There is no good way to choose encoding here:
def __str__(self, encoding="utf-8")
return self.__unicode__().encode(encoding)
Define write()
method:
def write(self, file, encoding=None):
encoding = encoding or getattr(file, 'encoding', None)
s = unicode(self)
if encoding is not None:
s = s.encode(encoding)
return file.write(s)
It should cover cases when the file has its own encoding or it supports Unicode strings directly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With