Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I pipe Unicode into a native application in PowerShell

I have a native program written in Python that expects its input on stdin. As a simple example,

#!python3
import sys
with open('foo.txt', encoding='utf8') as f:
    f.write(sys.stdin.read())

I want to be able to pass a (PowerShell) string to this program as standard input. Python expects its standard input in the encoding specified in $env:PYTHONIOENCODING, which I will typically set to UTF8 (so that I don't get any encoding errors).

But no matter what I do, characters get corrupted. I've searched the net and found suggestions to change [Console]::InputEncoding/[Console]::OutputEncoding, or to use chcp, but nothing seems to work.

Here's my basic test:

PS >[Console]::OutputEncoding.EncodingName
Unicode (UTF-8)
PS >[Console]::InputEncoding.EncodingName
Unicode (UTF-8)
PS >$env:PYTHONIOENCODING
utf-8
PS >python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"
´╗┐?

PS >chcp 1252
Active code page: 1252
PS >python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"
?

PS >chcp 65001
Active code page: 65001
PS >python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"
 ?

How can I fix this problem?

I can't even explain what's going on here. Basically, I want the test (python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())") to print out a Euro sign. And to understand why, I have to do whatever is needed to get that to work :-) (Because then I can translate that knowledge to my real scenario, which is to be able to write working pipelines of Python programs that don't break when they encounter Unicode characters).

like image 639
Paul Moore Avatar asked Oct 20 '22 02:10

Paul Moore


1 Answers

Thanks to mike z, the following works:

$OutputEncoding = [Console]::OutputEncoding = (new-object System.Text.UTF8Encoding $false)
$env:PYTHONIOENCODING = "utf-8"
python -c "print('\N{Euro sign}')" | python -c "import sys; print(sys.stdin.read())"

The new-object is needed to get a UTF-8 encoding without a BOM. The $OutputEncoding variable and [Console]::OutputEncoding both appear to need to be set.

I still don't fully understand the difference between the two encoding values, and why you would ever have them set differently (which appears to be the default).

like image 111
Paul Moore Avatar answered Oct 22 '22 17:10

Paul Moore