Question 1

How to print Unicode string in Python?

Accepted Answer

Now if you simply want to print the unicode string prettily, just use unicode's encode method: To make sure that every line from any file would be read as unicode, you'd better use the codecs.open function instead of just open, which allows you to specify file's encoding:

Question 2

How do you deal with Bom characters?

Accepted Answer

The simplest approach I've found is dealing with BOM characters in Unicode, and letting the codecs do the heavy lifting. There is only one Unicode byte order mark, so once data is converted to Unicode characters, determining if it's there and/or adding/removing it is easy. To read a file with a possible BOM:

Question 3

Is there a way to convert Unicode to ASCII in Python?

Accepted Answer

EDIT: I'm assuming that your intended goal is just to be able to read the file properly into a string in Python. If you're trying to convert to an ASCII string from Unicode, then there's really no direct way to do so, since the Unicode characters won't necessarily exist in ASCII.

Question 4

Can chardet detect bomb in text?

Accepted Answer

Note: chardet may return 'UTF-XXLE', 'UTF-XXBE' encodings that leave the BOM in the text. 'LE', 'BE' should be stripped to avoid it -- though it is easier to detect BOM yourself at this point e.g., as in @ivan_pozdeev's answer.

Reading Unicode file data with BOM chars in Python

Tags:

python

unicode

Recent Activity

Donate For Us