How to make Popen() understand UTF-8 properly?

Tags:

python

This is my code in Python:

[...]
proc = Popen(path, stdin=stdin, stdout=PIPE, stderr=PIPE)
result = [x for x in proc.stdout.readlines()]
result = ''.join(result);

Everything works fine, when it's ASCII. When I'm receiving UTF-8 text in stdout the result is unpredictable. In most cases the output is damaged. What is wrong here?

Btw, maybe this code should be optimized somehow?

807

asked Oct 13 '10 19:10

2 Answers

Have you tried decoding your string, and then combining your UTF-8 strings together? In Python 2.4+ (at least), this can be achieved with

result = [x.decode('utf8') for x in proc.stdout.readlines()]

The important point is that your lines x are sequences of bytes that must be interpreted as representing characters. The decode() method performs this interpretation (here, the bytes are assumed to be in the UTF-8 encoding): x.decode('utf8') is of type unicode, which you can think of as "string of characters" (which is different from "string of numbers between 0 and 255 [bytes]").

173

answered Sep 22 '22 23:09

Eric O Lebigot

I run into the same issue when using LogPipe.

I solved this by specifying additional arguments encoding='utf-8', errors='ignore' to fdopen().

# https://codereview.stackexchange.com/questions/6567/redirecting-subprocesses-output-stdout-and-stderr-to-the-logging-module
class LogPipe(threading.Thread):
    def __init__(self):
        """Setup the object with a logger and a loglevel
        and start the thread
        """
        threading.Thread.__init__(self)
        self.daemon = False
        # self.level = level
        self.fdRead, self.fdWrite = os.pipe()
        self.pipeReader = os.fdopen(self.fdRead, encoding='utf-8', errors='ignore')  # set utf-8 encoding and just ignore illegal character
        self.start()

    def fileno(self):
        """Return the write file descriptor of the pipe
        """
        return self.fdWrite

    def run(self):
        """Run the thread, logging everything.
        """
        for line in iter(self.pipeReader.readline, ''):
            # vlogger.log(self.level, line.strip('\n'))
            vlogger.debug(line.strip('\n'))

        self.pipeReader.close()

    def close(self):
        """Close the write end of the pipe.
        """
        os.close(self.fdWrite)

answered Sep 24 '22 23:09

hailinzeng

Related questions
                            
                                Numpy interconversion between multidimensional and linear indexing
                            
                                binary16 in Python
                            
                                Is there an equivalent to python's urllib in c/c++?
                            
                                General utility to remove/strip all comments from source code in various languages?
                            
                                Python Window Resize
                            
                                Calling code in a string without exec/eval, python
                            
                                Python and mySQLdb error: OperationalError: (1054, "Unknown column in 'where clause'")
                            
                                Referencing list entries within a for loop without indexes, possible?
                            
                                if you don't use scaffolding, is ruby on rails still good for rapid development?
                            
                                Simple RESTFUL client/server example in Python?
                            
                                Python bizarre class problem
                            
                                django orm versus sqlachemy, are they basically the same thing?
                            
                                Get new selection in a GtkTreeView during the signal
                            
                                Slower search when start character is given is counterintuitive
                            
                                How to parse a string in Java? Is there anything similar to Python's re.finditer()?
                            
                                How can I programmatically find the list of codecs known to Python? [duplicate]
                            
                                Auto Include python import statements in vim /emacs?
                            
                                Scriptable JavaScript interpreter with bindings for PHP or Python?
                            
                                How to make a Python string out of non-ascii "bytes"
                            
                                Picking out symbols from a code base with Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With