Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python 2.7 string.join() with unicode

Tags:

python

unicode

I have bunch of byte strings (str, not unicode, in python 2.7) containing unicode data (in utf-8 encoding).

I am trying to join them( by "".join(utf8_strings) or u"".join(utf8_strings)) which throws

UnicodeDecodeError: 'ascii' codec can't decode byte 0xec in position 0: ordinal not in range(128)`

Is there any way to make use of .join() method for non-ascii strings? sure I can concatenate them in a for loop, but that wouldn't be cost-effective.

like image 739
thkang Avatar asked Feb 07 '13 18:02

thkang


2 Answers

Joining byte strings using ''.join() works just fine; the error you see would only appear if you mixed unicode and str objects:

>>> utf8 = [u'\u0123'.encode('utf8'), u'\u0234'.encode('utf8')]
>>> ''.join(utf8)
'\xc4\xa3\xc8\xb4'
>>> u''.join(utf8)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128)
>>> ''.join(utf8 + [u'unicode object'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128)

The exceptions above are raised when using the Unicode value u'' as the joiner, and adding a Unicode string to the list of strings to join, respectively.

like image 59
Martijn Pieters Avatar answered Oct 31 '22 02:10

Martijn Pieters


"".join(...) will work if each parameter is a str (whatever the encoding may be).

The issue you are seeing is probably not related to the join, but the data you supply to it. Post more code so we can see what's really wrong.

like image 43
afflux Avatar answered Oct 31 '22 02:10

afflux