I have bunch of byte strings (str
, not unicode
, in python 2.7) containing unicode data (in utf-8
encoding).
I am trying to join them( by "".join(utf8_strings)
or u"".join(utf8_strings)
) which throws
UnicodeDecodeError: 'ascii' codec can't decode byte 0xec in position 0: ordinal not in range(128)`
Is there any way to make use of .join()
method for non-ascii strings? sure I can concatenate them in a for loop, but that wouldn't be cost-effective.
Joining byte strings using ''.join()
works just fine; the error you see would only appear if you mixed unicode
and str
objects:
>>> utf8 = [u'\u0123'.encode('utf8'), u'\u0234'.encode('utf8')]
>>> ''.join(utf8)
'\xc4\xa3\xc8\xb4'
>>> u''.join(utf8)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128)
>>> ''.join(utf8 + [u'unicode object'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128)
The exceptions above are raised when using the Unicode value u''
as the joiner, and adding a Unicode string to the list of strings to join, respectively.
"".join(...)
will work if each parameter is a str
(whatever the encoding may be).
The issue you are seeing is probably not related to the join, but the data you supply to it. Post more code so we can see what's really wrong.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With