Understanding Python Unicode and Linux terminal

Tags:

I have a Python script that writes some strings with UTF-8 encoding. In my script I am using mainly the str() function to cast to string. It looks like that:

Click to copy

mystring="this is unicode string:"+japanesevalues[1] 
#japanesevalues is a list of unicode values, I am sure it is unicode
print mystring

I don't use the Python terminal, just the standard Linux Red Hat x86_64 terminal. I set the terminal to output utf8 chars.

If I execute this:

Click to copy

#python myscript.py
this is unicode string: カラダーズ ソフィー

But if I do that:

Click to copy

#python myscript.py > output

I got the typical error:

Click to copy

UnicodeEncodeError: 'ascii' codec can't encode characters in position 253-254: ordinal not in range(128)

Why is that?

555

asked Jul 02 '13 06:07

Cesc

2 Answers

The terminal has a character set, and Python knows what that character set is, so it will automatically decode your Unicode strings to the byte-encoding that the terminal uses, in your case UTF-8.

But when you redirect, you are no longer using the terminal. You are now just using a Unix pipe. That Unix pipe doesn't have a charset, and Python has no way of knowing which encoding you now want, so it will fall back to a default character set. You have marked your question with "Python-3.x" but your print syntax is Python 2, so I suspect you are actually using Python 2. And then your sys.getdefaultencoding() is generally 'ascii', and in your case it's definitely so. And of course, you can not encode Japanese characters as ASCII, so you get an error.

Your best bet when using Python 2 is to encode the string with UTF-8 before printing it. Then redirection will work, and the resulting file with be UTF-8. That means it will not work if your terminal is something else, though, but you can get the terminal encoding from sys.stdout.encoding and use that (it will be None when redirecting under Python 2).

In Python 3, your code should work as is, except that you need to change print mystring to print(mystring).

150

answered Sep 28 '22 18:09

Lennart Regebro

If it outputs to the terminal then Python can examine the value of $LANG to pick a charset. All bets are off if you redirect.

answered Sep 28 '22 19:09

Ignacio Vazquez-Abrams

Related questions
                            
                                gaussian fit with scipy.optimize.curve_fit in python with wrong results
                            
                                Plugin architecture - Plugin Manager vs inspecting from plugins import *
                            
                                How to convert some character into five digit unicode one in Python 3.3?
                            
                                Unpack list and cast at the same time
                            
                                Passing arguments into os.system
                            
                                what is the diff between save_model and save_formset in django admin
                            
                                EntityFramework for Python [closed]
                            
                                Please code review my sample Python program [closed]
                            
                                Passing a parameter to the decorator in python
                            
                                Draw / Create Scatterplots of datasets with NaN
                            
                                Baffling AttributeError in python with simple sqlite query
                            
                                Python Edit CSV headers
                            
                                Counting permuations in Python
                            
                                How to get Facebook access token using Python library?
                            
                                Python file open() in Enthought Canopy fails with: "IOError No such file or directory"
                            
                                Cannot find python xlrd version
                            
                                How to make an abstract Haystack SearchIndex class
                            
                                Python-How to determine largest/smallest int/long/float/complex numbers my system can handle [duplicate]
                            
                                Python Svgwrite and font styles/ sizes
                            
                                Python Sort On The Fly

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Understanding Python Unicode and Linux terminal

Tags:

python

linux

unicode

cjk

Cesc

People also ask

2 Answers

Lennart Regebro

Ignacio Vazquez-Abrams

Recent Activity

Donate For Us