Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Piping latin-1 encoded output of a program to a Python 3 script

I want to process the output of a running program line-by-line (think tail -f) with a Python 3 script (on Linux).

The programs output, which is getting piped to the script, is encoded in latin-1, so, in Python 2, I used the codecs module to decode the input of sys.stdin properly:

#!/usr/bin/env python
import sys, codecs

sin = codecs.getreader('latin-1')(sys.stdin)
for line in sin:
    print '%s "%s"' % (type (line), line.encode('ascii','xmlcharrefreplace').strip())

This worked:

<type 'unicode'> "Hi! &#246;&#228;&#223;"
...

However, in Python 3, sys.stdin.encoding is UTF-8, and if I just read naively from stdin:

#!/usr/bin/env python3
import sys

for line in sys.stdin:
    print ('type:{0} line:{1}'.format(type (line), line))

I get this error:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xf6 in position 4: invalid start byte

How can I read non UTF-8 text data piped to stdin in Python 3?

like image 941
phoibos Avatar asked Oct 24 '25 20:10

phoibos


1 Answers

import sys
import io

with io.open(sys.stdin.fileno(),'r',encoding='latin-1') as sin:
    for line in sin:
        print ('type:{0} line:{1}'.format(type (line), line))

yields

type:<class 'str'> line:Hi! öäß
like image 50
unutbu Avatar answered Oct 26 '25 08:10

unutbu



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!