Python: How to force a "print" to use unicode instead of str, or otherwise naturally "print" the message without explicitly calling unicode()

Q: What does encoding =' UTF-8 do in Python?

UTF-8 is a byte oriented encoding. The encoding specifies that each character is represented by a specific sequence of one or more bytes.

Q: How do you create a Unicode string in Python?

You have two options to create Unicode string in Python. Either use decode() , or create a new Unicode string with UTF-8 encoding by unicode(). The unicode() method is unicode(string[, encoding, errors]) , its arguments should be 8-bit strings.

Q: Are Python Unicode strings default?

In Python 2, a string is by default a binary string and you need to use u'' to mark a string as a Unicode string. However, in Python 3, a string by default is a Unicode string, and you need to use b'' to explicitly mark a string as a binary string.

Tags:

python

string

character-encoding

unicode

cjk

Basically I just want to be able to create instances using a class called Bottle: eg class Bottle(object):... and then in another module be able to simply "print" any instance without having to hack code to explicitly call a character encoding routine.

In summary, when I try:

obj=Bottle(u"味精")
print obj

Or to an "in place" "print":

print Bottle(u"味精")

I get:

"UnicodeEncodeError: 'ascii' codec can't encode characters"

Similar stackoverflow questions:

unicode class in Python
how to print chinese word in my code.. using python
Python string decoding issue
python 3.0, how to make print() output unicode?

¢ It's currently not feasible to switch to python3. ¢

A solution or hint (and explanation) on how to do an in place utf-8 print (just like class U does successfully below) would be muchly appreciated. :-)

ThanX N

Sample code:

-------- 8>< - - - - cut here - - - -

#!/usr/bin/env python
# -*- coding: utf-8 -*-

def setdefaultencoding(encoding="utf-8"):
  import sys, codecs

  org_encoding = sys.getdefaultencoding()
  if org_encoding == "ascii": # not good enough
    print "encoding set to "+encoding
    sys.stdout = codecs.getwriter(encoding)(sys.stdout)
    sys.stderr = codecs.getwriter(encoding)(sys.stderr)

setdefaultencoding()

msg=u"味精" # the message!

class U(unicode): pass

m1=U(msg)

print "A)", m1 # works fine, even with unicode, but

class Bottle(object):
  def __init__(self,msg): self.msg=msg
  def __repr__(self): 
    print "debug: __repr__",self.msg
    return '{{{'+self.msg+'}}}'
  def __unicode__(self): 
    print "debug: __unicode__",self.msg
    return '{{{'+self.msg+'}}}'
  def __str__(self): 
    print "debug: __str__",self.msg
    return '{{{'+self.msg+'}}}'
  def decode(self,arg): print "debug: decode",self.msg
  def encode(self,arg): print "debug: encode",self.msg
  def translate(self,arg): print "debug: translate",self.msg

m2=Bottle(msg)

#print "B)", str(m2)
print "C) repr(x):", repr(m2)
print "D) unicode(x):", unicode(m2)
print "E)",m2 # gives:  UnicodeEncodeError: 'ascii' codec can't encode characters

-------- 8>< - - - - cut here - - - - Python 2.4 output:

encoding set to utf-8
A) 味精
C) repr(x): debug: __repr__ 味精
{{{\u5473\u7cbe}}}
D) unicode(x): debug: __unicode__ 味精
{{{味精}}}
E) debug: __str__ 味精
Traceback (most recent call last):
  File "./uc.py", line 43, in ?
    print "E)",m2 # gives:  UnicodeEncodeError: 'ascii' codec can't encode characters
UnicodeEncodeError: 'ascii' codec can't encode characters in position 3-4: ordinal not in range(128)

-------- 8>< - - - - cut here - - - - Python 2.6 output:

encoding set to utf-8
A) 味精
C) repr(x): debug: __repr__ 味精
Traceback (most recent call last):
  File "./uc.py", line 41, in <module>
    print "C) repr(x):", repr(m2)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 3-4: ordinal not in range(128)

790

asked Nov 22 '11 05:11

NevilleDNZ

1 Answers

if you use sys.stdout = codecs.getwriter(encoding)(sys.stdout) then you should pass Unicode strings to print:

>>> print u"%s" % Bottle(u"魯賓遜漂流記")
debug: __unicode__ 魯賓遜漂流記
{{{魯賓遜漂流記}}}

As @bobince points out in the comments: avoid changing sys.stdout in such manner otherwise it might break any library code that works with sys.stdout and doesn't expect to print Unicode strings.

In general:

__unicode__() should return Unicode strings:

def __init__(self, msg, encoding='utf-8'):
    if not isinstance(msg, unicode):
       msg = msg.decode(encoding)
    self.msg = msg

def __unicode__(self):
    return u"{{{%s}}}" % self.msg

__repr__() should return ascii-friendly str object:

def __repr__(self):
    return "Bottle(%r)" % self.msg

__str__() should return str object. Add optional encoding to document what encoding is used. There is no good way to choose encoding here:

def __str__(self, encoding="utf-8")
    return self.__unicode__().encode(encoding)

Define write() method:

def write(self, file, encoding=None):
    encoding = encoding or getattr(file, 'encoding', None)
    s = unicode(self)
    if encoding is not None:
       s = s.encode(encoding)
    return file.write(s)

It should cover cases when the file has its own encoding or it supports Unicode strings directly.

144

answered Oct 11 '22 17:10

jfs

Related questions
                            
                                How is calling module and function by string handled in python?
                            
                                Create ,UPDATE and DELETE call using django-tastypie
                            
                                Embed a function from a Matlab MEX file directly in Python
                            
                                Matlab function equivalent for Python (Flood Fill)
                            
                                How to install wxPython using virtualenv
                            
                                Creating versioned libraries in python
                            
                                Django - One view, multiple URLs?
                            
                                Issue with python/pytz Converting from local timezone to UTC then back
                            
                                Distributing python on Mac, Linux, and Windows using cx_freeze: can I generate all apps from one platform?
                            
                                How to select an item for dropdown menu with mechanize in python?
                            
                                Python equivalent to C strtod
                            
                                Variables declared in exec'ed code don't become local in Python 3 – documentation?
                            
                                Is it possible to end a python module import with something like a return?
                            
                                Python: Maintaining code in modules
                            
                                Is it possible to add a weight/probability to a node in graph theory(using networkx)
                            
                                ttk treeview: selected color
                            
                                Finding an optimal solution that minimizes a constraint?
                            
                                Show Explorer's properties dialog for a file in Windows
                            
                                How to insert multiple values with subquery using SQLAlchemy Core?
                            
                                What is the best method to call a Python 3.x program from within Python 2.x?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: How to force a "print" to use unicode instead of str, or otherwise naturally "print" the message without explicitly calling unicode()

Tags:

python

string

character-encoding

unicode

cjk

NevilleDNZ

People also ask

1 Answers

jfs

Recent Activity

Donate For Us

Python: How to force a "print" to use __unicode__ instead of __str__, or otherwise naturally "print" the message without explicitly calling unicode()

Tags:

python

string

character-encoding

unicode

cjk

NevilleDNZ

People also ask

1 Answers

jfs

Related questions

Recent Activity

Donate For Us

Python: How to force a "print" to use unicode instead of str, or otherwise naturally "print" the message without explicitly calling unicode()