Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spanish text in .py files

This is the code

A = "Diga sí por cualquier número de otro cuidador.".encode("utf-8")

I get this error:

'ascii' codec can't decode byte 0xed in position 6: ordinal not in range(128)

I tried numerous encodings unsuccessfully.

Edit:

I already have this at the beginning

# -*- coding: utf-8 -*-

Changing to

A = u"Diga sí por cualquier número de otro cuidador.".encode("utf-8")

doesn't help

like image 212
Kamal Saini Avatar asked May 30 '11 17:05

Kamal Saini


2 Answers

Are you using Python 2?

In Python 2, that string literal is a bytestring. You're trying to encode it, but you can encode only a Unicode string, so Python will first try to decode the bytestring to a Unicode string using the default "ascii" encoding.

Unfortunately, your string contains non-ASCII characters, so it can't be decoded to Unicode.

The best solution is to use a Unicode string literal, like this:

A = u"Diga sí por cualquier número de otro cuidador.".encode("utf-8")
like image 147
MRAB Avatar answered Sep 28 '22 05:09

MRAB


Error message: 'ascii' codec can't decode byte 0xed in position 6: ordinal not in range(128)

says that the 7th byte is 0xed. This is either the first byte of the UTF-8 sequence for some (maybe CJK) high-ordinal Unicode character (that's absolutely not consistent with the reported facts), or it's your i-acute encoded in Latin1 or cp1252. I'm betting on the cp1252.

If your file was encoded in UTF-8, the offending byte would be not 0xed but 0xc3:

Preliminaries:
>>> import unicodedata
>>> unicodedata.name(u'\xed')
'LATIN SMALL LETTER I WITH ACUTE'
>>> uc = u'Diga s\xed por'

What happens if file is encoded in UTF-8:
>>> infile = uc.encode('utf8')
>>> infile
'Diga s\xc3\xad por'
>>> infile.encode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6: ordinal not in range(128)
#### NOT the message reported in the question ####

What happens if file is encoded in cp1252 or latin1 or similar:
>>> infile = uc.encode('cp1252')
>>> infile
'Diga s\xed por'
>>> infile.encode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 6: ordinal not in range(128)
#### As reported in the question ####

Having # -*- coding: utf-8 -*- at the start of your code does not magically ensure that your file is encoded in UTF-8 -- that's up to you and your text editor.

Actions:

  1. save your file as UTF-8.
  2. As suggested by others, you need u'blah blah'
like image 20
John Machin Avatar answered Sep 28 '22 06:09

John Machin