While porting code from Python 2 to Python 3, I run into this problem when reading UTF-8 text from standard input. In Python 2, this works fine: <pre class="prettyprint"><code>for line in sys.stdin: ... </code></pre> But Python 3 expects ASCII from sys.stdin, and if there are non-ASCII characters in the input, I get the error: <blockquote> UnicodeDecodeError: 'ascii' codec can't decode byte .. in position ..: ordinal not in range(128) </blockquote> For a regular file, I would specify the encoding when opening the file: <pre class="prettyprint"><code>with open('filename', 'r', encoding='utf-8') as file: for line in file: ... </code></pre> But how can I specify the encoding for standard input? Other SO posts (e.g. How to change the stdin encoding on python) have suggested using <pre class="prettyprint"><code>input_stream = codecs.getreader('utf-8')(sys.stdin) for line in input_stream: ... </code></pre> However, this doesn't work in Python 3. I still get the same error message. I'm using Ubuntu 12.04.2 and my locale is set to en_US.UTF-8.

Python 3 does not expect ASCII from <code>sys.stdin</code>. It'll open <code>stdin</code> in text mode and make an educated guess as to what encoding is used. That guess may come down to <code>ASCII</code>, but that is not a given. See the <code>sys.stdin</code> documentation on how the codec is selected. Like other file objects opened in text mode, the <code>sys.stdin</code> object derives from the <code>io.TextIOBase</code> base class; it has a <code>.buffer</code> attribute pointing to the underlying buffered IO instance (which in turn has a <code>.raw</code> attribute). Wrap the <code>sys.stdin.buffer</code> attribute in a new <code>io.TextIOWrapper()</code> instance to specify a different encoding: <pre class="prettyprint"><code>import io import sys input_stream = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8') </code></pre> Alternatively, set the <code>PYTHONIOENCODING</code> environment variable to the desired codec when running python. From Python 3.7 onwards, you can also reconfigure the existing <code>std*</code> wrappers, provided you do it at the start (before any data has been read): <pre class="prettyprint"><code># Python 3.7 and newer sys.stdin.reconfigure(encoding='utf-8') </code></pre>

Python 3: How to specify stdin encoding

Tags:

python

python-3.x

stdin

encoding

unicode

While porting code from Python 2 to Python 3, I run into this problem when reading UTF-8 text from standard input. In Python 2, this works fine:

for line in sys.stdin:     ...

But Python 3 expects ASCII from sys.stdin, and if there are non-ASCII characters in the input, I get the error:

UnicodeDecodeError: 'ascii' codec can't decode byte .. in position ..: ordinal not in range(128)

For a regular file, I would specify the encoding when opening the file:

with open('filename', 'r', encoding='utf-8') as file:     for line in file:         ...

But how can I specify the encoding for standard input? Other SO posts (e.g. How to change the stdin encoding on python) have suggested using

input_stream = codecs.getreader('utf-8')(sys.stdin) for line in input_stream:     ...

However, this doesn't work in Python 3. I still get the same error message. I'm using Ubuntu 12.04.2 and my locale is set to en_US.UTF-8.

908

asked May 14 '13 17:05

Seppo Enarvi

1 Answers

Python 3 does not expect ASCII from sys.stdin. It'll open stdin in text mode and make an educated guess as to what encoding is used. That guess may come down to ASCII, but that is not a given. See the sys.stdin documentation on how the codec is selected.

Like other file objects opened in text mode, the sys.stdin object derives from the io.TextIOBase base class; it has a .buffer attribute pointing to the underlying buffered IO instance (which in turn has a .raw attribute).

Wrap the sys.stdin.buffer attribute in a new io.TextIOWrapper() instance to specify a different encoding:

import io import sys  input_stream = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8')

Alternatively, set the PYTHONIOENCODING environment variable to the desired codec when running python.

From Python 3.7 onwards, you can also reconfigure the existing std* wrappers, provided you do it at the start (before any data has been read):

# Python 3.7 and newer sys.stdin.reconfigure(encoding='utf-8')

139

answered Sep 27 '22 20:09

Martijn Pieters

Related questions
                            
                                sampling random floats on a range in numpy
                            
                                How to check if all values in the columns of a numpy matrix are the same?
                            
                                How to split a string using an empty separator in Python
                            
                                How to get a normal distribution within a range in numpy? [duplicate]
                            
                                Cannot "pip install cryptography" in Docker Alpine Linux 3.3 with OpenSSL 1.0.2g and Python 2.7
                            
                                sqlalchemy existing database query
                            
                                How to write native newline character to a file descriptor in Python?
                            
                                How do I change the file creation date of a Windows file?
                            
                                I don't understand this python __del__ behaviour
                            
                                PyMySQL can't connect to MySQL on localhost
                            
                                Access Multiselect Form Field in Flask
                            
                                How to read file N lines at a time?
                            
                                Python Equality Check Difference
                            
                                crop center portion of a numpy image
                            
                                What do underscores in a number mean? [duplicate]
                            
                                plot.ly offline mode in jupyter lab not displaying plots
                            
                                draw points using matplotlib.pyplot [[x1,y1],[x2,y2]]
                            
                                Fetch a file from a local url with Python requests?
                            
                                How to get symlink target in Python?
                            
                                Why should I use operator.itemgetter(x) instead of [x]?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With