Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I use Unicode string by default?

Tags:

Is it considered as a good practice to pick Unicode string over regular string when coding in Python? I mainly work on the Windows platform, where most of the string types are Unicode these days (i.e. .NET String, '_UNICODE' turned on by default on a new c++ project, etc ). Therefore, I tend to think that the case where non-Unicode string objects are used is a sort of rare case. Anyway, I'm curious about what Python practitioners do in real-world projects.

like image 475
Kei Avatar asked Jul 12 '09 17:07

Kei


People also ask

Why do we use Unicode strings?

Unicode is a standard encoding system that is used to represent characters from almost all languages. Every Unicode character is encoded using a unique integer code point between 0 and 0x10FFFF . A Unicode string is a sequence of zero or more code points.

Are Python Unicode strings default?

The distinction between bytes and Unicode strings is important because strings in Python are Unicode by default. However, external hardware like Arduino's, oscilloscopes and voltmeters transmit characters as bytes.

What is the difference between Unicode string and byte string?

A character in a str represents one Unicode character. However, to represent more than 256 characters, individual Unicode encodings use more than one byte per character to represent many characters. bytes objects give you access to the underlying bytes.

Is Unicode the same as string in Python?

Python supports the string type and the unicode type. A string is a sequence of chars while a unicode is a sequence of "pointers". The unicode is an in-memory representation of the sequence and every symbol on it is not a char but a number (in hex format) intended to select a char in a map.


2 Answers

From my practice -- use unicode.

At beginning of one project we used usuall strings, however our project was growing, we were implementing new features and using new third-party libraries. In that mess with non-unicode/unicode string some functions started failing. We started spending time localizing this problems and fixing them. However, some third-party modules doesn't supported unicode and started failing after we switched to it (but this is rather exclusion than a rule).

Also I have some experience when we needed to rewrite some third party modules(e.g. SendKeys) cause they were not supporting unicode. If it was done in unicode from beginning it will be better :)

So I think today we should use unicode.

P.S. All that mess upwards is only my hamble opinion :)

like image 64
Mikhail Churbanov Avatar answered Oct 05 '22 20:10

Mikhail Churbanov


As you ask this question, I suppose you are using Python 2.x.

Python 3.0 changed quite a lot in string representation, and all text now is unicode.
I would go for unicode in any new project - in a way compatible with the switch to Python 3.0 (see details).

like image 27
rob Avatar answered Oct 05 '22 19:10

rob