I have a list with some strings (most of which I fetched from a sqlite3 database):
stats_list = ['Statistik \xc3\xb6ver s\xc3\xa5nger\n', 'Antal\tS\xc3\xa5ng', '1\tCarola - Betlehems Stj\xc3\xa4rna', '\n\nStatistik \xc3\xb6ver datak\xc3\xa4llor\n', 'K\xc3\xa4lla\tAntal', 'MANUAL\t1', '\n\nStatistik \xc3\xb6ver \xc3\xb6nskare\n', 'Antal\tId', u'1\tNiclas']
When I try to join it with:
return '\n'.join(stats_list)
I get this error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)
Is it possible to get any clue why this happens just by looking at the list? If I loop over the list and print it to screen, I get this:
Statistik över sånger Antal Sång 1 Carola - Betlehems Stjärna Statistik över datakällor Källa Antal MANUAL 1 Statistik över önskare Antal Id 1 Niclas
which is exactly what I was expecting, and no error is shown. (The special characters are swedish).
EDIT:
I'll tried this:
return '\n'.join(i.decode('utf8') for i in stats_list)
But it returned:
Traceback (most recent call last):
File "./CyberJukebox.py", line 489, in on_stats_to_clipboard
stats = self.jbox.get_stats()
File "/home/nine/dev/python/CyberJukebox/jukebox.py", line 235, in get_stats
return self._stats.get_string()
File "/home/nine/dev/python/CyberJukebox/jukebox.py", line 59, in get_string
return '\n'.join(i.decode('utf8') for i in stats_list)
File "/home/nine/dev/python/CyberJukebox/jukebox.py", line 59, in <genexpr>
return '\n'.join(i.decode('utf8') for i in stats_list)
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 10: ordinal not in range(128)
EDIT 2:
The suggested solution works for me in the interpreter. But when I execute the code it won't work. I can't wrap my head around this. Maybe it's something obvious I'm missing so I'm pasting the whole method here:
def get_string(self):
stats_list = [u'Statistik över sånger\n', u'Antal\tSång']
stats = sorted([(v, k) for k, v in self.song_stats.iteritems()], reverse=True)
for row in stats:
line = '%s\t%s' % row
stats_list.append(line)
stats_list.append(u'\n\nStatistik över datakällor\n')
stats_list.append(u'Källa\tAntal')
stats = sorted([(k, v) for k, v in self.exts_stats.iteritems()])
for row in stats:
line = '%s\t%s' % row
stats_list.append(line)
stats_list.append(u'\n\nStatistik över önskare\n')
stats_list.append(u'Antal\tId')
stats = sorted([(v, k) for k, v in self.wisher_stats.iteritems() if k != ''], reverse=True)
for row in stats:
line = '%s\t%s' % row
stats_list.append(line)
return '\n'.join(i.decode('utf8') for i in stats_list)
song_stats
, exts_stats
and wisher_stats
are dictionaries in the class.
Your problem is probably that you are mixing unicode strings with byte strings.
The code in "Edit 2" has several unicode strings being added to stats_list
:
stats_list = [u'Statistik över sånger\n', u'Antal\tSång']
If you try to decode these unicode strings, you will get a UnicodeEncodeError
. This because Python will first try to use the default encoding (usually "ascii") to encode the strings before trying to decode them. It only ever makes sense to decode byte strings.
So to start with, change the final line in the function to:
return '\n'.join(stats_list)
Now you need to check whether any of the other strings that get added to stats_list
are byte strings, and ensure they get decoded to unicode strings properly first.
So put print type(line)
after the three lines like this:
line = '%s\t%s' % row
and then wherever it prints <type 'str'>
, change the following line to:
stats_list.append(line.decode('utf-8'))
Of course, if it prints <type 'unicode'>
, there's no need to change the following line.
A even better solution here would be to check how the dictionaries song_stats
, exts_stats
and wisher_stats
are created, and make sure they always contain unicode strings (or byte strings that only contain ascii characters).
The strings are encoded in UTF-8. You need to .decode
them to a unicode
:
>>> 'Statistik \xc3\xb6ver s\xc3\xa5nger\n'.decode('utf-8')
u'Statistik \xf6ver s\xe5nger\n'
>>> print _
Statistik över sånger
Use comprehension to perform this to all elements:
return '\n'.join(x.decode('utf-8') for x in stats_list)
Python is complaining that it can't convert the string 'Statistik \xc3\xb6ver s\xc3\xa5nger\n'
to an ASCII string. Try prefixing all your UNICODE strings with u
.
stats_list = [u'Statistik \xc3\xb6ver s\xc3\xa5nger\n', u'Antal\tS\xc3\xa5ng', u'1\tCarola - Betlehems Stj\xc3\xa4rna', u'\n\nStatistik \xc3\xb6ver datak\xc3\xa4llor\n', u'K\xc3\xa4lla\tAntal', u'MANUAL\t1', u'\n\nStatistik \xc3\xb6ver \xc3\xb6nskare\n', u'Antal\tId', u'1\tNiclas']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With