Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: How to force a "print" to use __unicode__ instead of __str__, or otherwise naturally "print" the message without explicitly calling unicode()

Basically I just want to be able to create instances using a class called Bottle: eg class Bottle(object):... and then in another module be able to simply "print" any instance without having to hack code to explicitly call a character encoding routine.

In summary, when I try:

obj=Bottle(u"味精")
print obj

Or to an "in place" "print":

print Bottle(u"味精")

I get:

"UnicodeEncodeError: 'ascii' codec can't encode characters"

Similar stackoverflow questions:

  • unicode class in Python
  • how to print chinese word in my code.. using python
  • Python string decoding issue
  • python 3.0, how to make print() output unicode?

¢ It's currently not feasible to switch to python3. ¢

A solution or hint (and explanation) on how to do an in place utf-8 print (just like class U does successfully below) would be muchly appreciated. :-)

ThanX N

--

Sample code:

-------- 8>< - - - - cut here - - - -

#!/usr/bin/env python
# -*- coding: utf-8 -*-

def setdefaultencoding(encoding="utf-8"):
  import sys, codecs

  org_encoding = sys.getdefaultencoding()
  if org_encoding == "ascii": # not good enough
    print "encoding set to "+encoding
    sys.stdout = codecs.getwriter(encoding)(sys.stdout)
    sys.stderr = codecs.getwriter(encoding)(sys.stderr)

setdefaultencoding()

msg=u"味精" # the message!

class U(unicode): pass

m1=U(msg)

print "A)", m1 # works fine, even with unicode, but

class Bottle(object):
  def __init__(self,msg): self.msg=msg
  def __repr__(self): 
    print "debug: __repr__",self.msg
    return '{{{'+self.msg+'}}}'
  def __unicode__(self): 
    print "debug: __unicode__",self.msg
    return '{{{'+self.msg+'}}}'
  def __str__(self): 
    print "debug: __str__",self.msg
    return '{{{'+self.msg+'}}}'
  def decode(self,arg): print "debug: decode",self.msg
  def encode(self,arg): print "debug: encode",self.msg
  def translate(self,arg): print "debug: translate",self.msg

m2=Bottle(msg)

#print "B)", str(m2)
print "C) repr(x):", repr(m2)
print "D) unicode(x):", unicode(m2)
print "E)",m2 # gives:  UnicodeEncodeError: 'ascii' codec can't encode characters

-------- 8>< - - - - cut here - - - - Python 2.4 output:

encoding set to utf-8
A) 味精
C) repr(x): debug: __repr__ 味精
{{{\u5473\u7cbe}}}
D) unicode(x): debug: __unicode__ 味精
{{{味精}}}
E) debug: __str__ 味精
Traceback (most recent call last):
  File "./uc.py", line 43, in ?
    print "E)",m2 # gives:  UnicodeEncodeError: 'ascii' codec can't encode characters
UnicodeEncodeError: 'ascii' codec can't encode characters in position 3-4: ordinal not in range(128)

-------- 8>< - - - - cut here - - - - Python 2.6 output:

encoding set to utf-8
A) 味精
C) repr(x): debug: __repr__ 味精
Traceback (most recent call last):
  File "./uc.py", line 41, in <module>
    print "C) repr(x):", repr(m2)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 3-4: ordinal not in range(128)
like image 790
NevilleDNZ Avatar asked Nov 22 '11 05:11

NevilleDNZ


People also ask

What does encoding =' UTF-8 do in Python?

UTF-8 is a byte oriented encoding. The encoding specifies that each character is represented by a specific sequence of one or more bytes.

How do you create a Unicode string in Python?

You have two options to create Unicode string in Python. Either use decode() , or create a new Unicode string with UTF-8 encoding by unicode(). The unicode() method is unicode(string[, encoding, errors]) , its arguments should be 8-bit strings.

Are Python Unicode strings default?

In Python 2, a string is by default a binary string and you need to use u'' to mark a string as a Unicode string. However, in Python 3, a string by default is a Unicode string, and you need to use b'' to explicitly mark a string as a binary string.


1 Answers

if you use sys.stdout = codecs.getwriter(encoding)(sys.stdout) then you should pass Unicode strings to print:

>>> print u"%s" % Bottle(u"魯賓遜漂流記")
debug: __unicode__ 魯賓遜漂流記
{{{魯賓遜漂流記}}}

As @bobince points out in the comments: avoid changing sys.stdout in such manner otherwise it might break any library code that works with sys.stdout and doesn't expect to print Unicode strings.

In general:

__unicode__() should return Unicode strings:

def __init__(self, msg, encoding='utf-8'):
    if not isinstance(msg, unicode):
       msg = msg.decode(encoding)
    self.msg = msg

def __unicode__(self):
    return u"{{{%s}}}" % self.msg

__repr__() should return ascii-friendly str object:

def __repr__(self):
    return "Bottle(%r)" % self.msg

__str__() should return str object. Add optional encoding to document what encoding is used. There is no good way to choose encoding here:

def __str__(self, encoding="utf-8")
    return self.__unicode__().encode(encoding)

Define write() method:

def write(self, file, encoding=None):
    encoding = encoding or getattr(file, 'encoding', None)
    s = unicode(self)
    if encoding is not None:
       s = s.encode(encoding)
    return file.write(s)

It should cover cases when the file has its own encoding or it supports Unicode strings directly.

like image 144
jfs Avatar answered Oct 11 '22 17:10

jfs