Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Creating a Unicode string

Tags:

python

unicode

I have a problem in Python with Unicode. I need plot a graph with Unicode annotations in it. According to the tutorial I should just create my string in Unicode. I do it like this:

annotation = u"%s has %s rev"%(art.title, len(art.revisions))

It is art.title that has Unicode characters in it. Sometimes that code works, sometimes it gives me the error below:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 19: ordinal not in range(128)

How can I fix it?

EDIT: I have error exactly after "annotation" line:

  File "script.py", line 195, in test_trie
annotation = u"%s has %s rev"%(art.title, len(art.revisions))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 19: ordinal not in range(128)
like image 454
ashim Avatar asked Apr 20 '12 00:04

ashim


People also ask

How do you add a Unicode to a string?

Inserting Unicode characters To insert a Unicode character, type the character code, press ALT, and then press X. For example, to type a dollar symbol ($), type 0024, press ALT, and then press X. For more Unicode character codes, see Unicode character code charts by script.

What is a Unicode string in Python?

To summarize the previous section: a Unicode string is a sequence of code points, which are numbers from 0 through 0x10FFFF (1,114,111 decimal). This sequence of code points needs to be represented in memory as a set of code units, and code units are then mapped to 8-bit bytes.

Is Unicode the same as string in Python?

Python supports the string type and the unicode type. A string is a sequence of chars while a unicode is a sequence of "pointers". The unicode is an in-memory representation of the sequence and every symbol on it is not a char but a number (in hex format) intended to select a char in a map.

What is an Unicode string?

Unicode is a standard encoding system that is used to represent characters from almost all languages. Every Unicode character is encoded using a unique integer code point between 0 and 0x10FFFF . A Unicode string is a sequence of zero or more code points.


2 Answers

You have two options: Either use art.title.decode('utf_8'), or create a new Unicode string with UTF-8 encoding by unicode(art.title, 'utf_8').

like image 130
Makoto Avatar answered Oct 19 '22 18:10

Makoto


I think it depends if your title has a unicode characters or not.

I would try adding art.title.encode("utf-8") or art.title.decode("utf-8") and see how it works

like image 30
Maksym Kozlenko Avatar answered Oct 19 '22 18:10

Maksym Kozlenko