Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to use list of python objects whose representation is unicode

Tags:

python

unicode

I have a object which contains unicode data and I want to use that in its representaion e.g.

# -*- coding: utf-8 -*-

class A(object):

    def __unicode__(self):
        return u"©au"

    def __repr__(self):
        return unicode(self).encode("utf-8")

    __str__ = __repr__ 

a = A()


s1 = u"%s"%a # works
#s2 = u"%s"%[a] # gives unicode decode error
#s3 = u"%s"%unicode([a])  # gives unicode decode error

Now even if I return unicode from repr it still gives error so question is how can I use a list of such objects and create another unicode string out of it?

platform details:

"""
Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
'Linux-2.6.24-19-generic-i686-with-debian-lenny-sid'
""" 

also not sure why

print a # works
print unicode(a) # works
print [a] # works
print unicode([a]) # doesn't works 

python group answers that http://groups.google.com/group/comp.lang.python/browse_thread/thread/bd7ced9e4017d8de/2e0b07c761604137?lnk=gst&q=unicode#2e0b07c761604137

like image 487
Anurag Uniyal Avatar asked Dec 23 '22 11:12

Anurag Uniyal


1 Answers

s1 = u"%s"%a # works

This works, because when dealing with 'a' it is using its unicode representation (i.e. the unicode method),

when however you wrap it in a list such as '[a]' ... when you try to put that list in the string, what is being called is the unicode([a]) (which is the same as repr in the case of list), the string representation of the list, which will use 'repr(a)' to represent your item in its output. This will cause a problem since you are passing a 'str' object (a string of bytes) that contain the utf-8 encoded version of 'a', and when the string format is trying to embed that in your unicode string, it will try to convert it back to a unicode object using hte default encoding, i.e. ASCII. since ascii doesn't have whatever character it's trying to conver, it fails

what you want to do would have to be done this way: u"%s" % repr([a]).decode('utf-8') assuming all your elements encode to utf-8 (or ascii, which is a utf-8 subset from unicode point of view).

for a better solution (if you still want keep the string looking like a list str) you would have to use what was suggested previously, and use join, in something like this:

u'[%s]' % u','.join(unicode(x) for x in [a,a])

though this won't take care of list containing list of your A objects.

My explanation sounds terribly unclear, but I hope you can make some sense out of it.

like image 178
Nico Avatar answered Dec 24 '22 23:12

Nico