Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert *any* Python object into a string?

Tags:

python

utf-8

I want to concatenate a list of various Python objects into one string. The objects can be literally anything. I thought I could simply do this using the following code:

' '.join([str(x) for x in the_list])

but unfortunately that sometimes gives me a UnicodeEncodeError:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 80: ordinal not in range(128)

in this SO answer I found someone who says that I need to use .encode('utf-8'), so I changed my code to this:

' '.join([x.encode('utf-8') for x in the_list])

But if the objects are not strings or unicodes but for example ints I get an AttributeError: 'int' object has no attribute 'encode'. So this means I need to use some kind of if-statement to check what kind of type it is and how to convert it. But when should I use .encode('utf-8') and when should I use str()?

It would be even better if I could also do some kind of oneliner for this, but I wouldn't know how? Does anybody else know? All tips are welcome!

like image 930
kramer65 Avatar asked Dec 17 '15 16:12

kramer65


2 Answers

Python 2.x use repr(). Python 3.x use repr() if you don't mind non-ASCII Unicode in the result, or ascii() if you do:

>>> a=1             # integer
>>> class X: pass
...
>>> x=X()           # class
>>> y='\u5000'      # Unicode string
>>> z=b'\xa0'       # non-ASCII byte string
>>> ' '.join(ascii(i) for i in (a,x,y,z))
"1 <__main__.X object at 0x0000000002974B38> '\\u5000' b'\\xa0'"

Example of differences between 2.X and 3.X repr(), and 3.X ascii():

>>> # Python 3
>>> s = 'pingüino' # Unicode string
>>> s
'pingüino'
>>> repr(s)
"'pingüino'"
>>> print(repr(s))
'pingüino'
>>> ascii(s)
"'ping\\xfcino'"
>>> print(ascii(s))
'ping\xfcino'    

>>> # Python 2
>>> s = u'pingüino'
>>> s
u'ping\xfcino'
>>> repr(s)
"u'ping\\xfcino'"
>>> print(repr(s))
u'ping\xfcino'
like image 160
Mark Tolonen Avatar answered Nov 04 '22 12:11

Mark Tolonen


You can try joining with a unicode object instead..

u' '.join(unicode(x) for x in thelist)

Or what you had before will work fine in python3. Just be sure to:

  1. decode early
  2. unicode everywhere
  3. encode late

For more details see this talk

like image 21
Chad S. Avatar answered Nov 04 '22 12:11

Chad S.