Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

subprocess's Popen closes stdout/stderr filedescriptors used in another thread when Popen errors

An internal library that heavily uses subprocess.Popen() started failing its automated tests when we upgraded from Python 2.7.3 to Python 2.7.5. This library is used in a threaded environment. After debugging the issue, I was able to create a short Python script that demonstrates the error seen as in the failing tests.

This is the script (called "threadedsubprocess.py"):

import time
import threading
import subprocess

def subprocesscall():
    p = subprocess.Popen(
        ['ls', '-l'],
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
        )
    time.sleep(2) # simulate the Popen call takes some time to complete.
    out, err = p.communicate()
    print 'succeeding command in thread:', threading.current_thread().ident

def failingsubprocesscall():
    try:
        p = subprocess.Popen(
            ['thiscommandsurelydoesnotexist'],
            stdin=subprocess.PIPE,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            )
    except Exception as e:
        print 'failing command:', e, 'in thread:', threading.current_thread().ident

print 'main thread is:', threading.current_thread().ident

subprocesscall_thread = threading.Thread(target=subprocesscall)
subprocesscall_thread.start()
failingsubprocesscall()
subprocesscall_thread.join()

Note: this script does not exit with an IOError when ran from Python 2.7.3. It does fail at least 50% of the times when ran from Python 2.7.5 (both on the same Ubuntu 12.04 64-bit VM).

The error that is raised on Python 2.7.5 is this:

/opt/python/2.7.5/bin/python ./threadedsubprocess.py 
main thread is: 139899583563520
failing command: [Errno 2] No such file or directory 139899583563520
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/opt/python/2.7.5/lib/python2.7/threading.py", line 808, in __bootstrap_inner
    self.run()
  File "/opt/python/2.7.5/lib/python2.7/threading.py", line 761, in run
    self.__target(*self.__args, **self.__kwargs)
  File "./threadedsubprocess.py", line 13, in subprocesscall
    out, err = p.communicate()
  File "/opt/python/2.7.5/lib/python2.7/subprocess.py", line 806, in communicate
    return self._communicate(input)
  File "/opt/python/2.7.5/lib/python2.7/subprocess.py", line 1379, in _communicate
    self.stdin.close()
IOError: [Errno 9] Bad file descriptor

close failed in file object destructor:
IOError: [Errno 9] Bad file descriptor

When comparing the subprocess module from Python 2.7.3 to Python 2.7.5 I see the Popen()'s __init__() call indeed now explicitly closes the stdin, stdout and stderr file descriptors in case executing the command somehow fails. This seems to be an intended fix applied in Python 2.7.4 to prevent leaking file descriptors (http://hg.python.org/cpython/file/ab05e7dd2788/Misc/NEWS#l629).

The diff between Python 2.7.3 and Python 2.7.5 that seems to be relevant to this issue is in the Popen __init__():

@@ -671,12 +702,33 @@
          c2pread, c2pwrite,
          errread, errwrite) = self._get_handles(stdin, stdout, stderr)

-        self._execute_child(args, executable, preexec_fn, close_fds,
-                            cwd, env, universal_newlines,
-                            startupinfo, creationflags, shell,
-                            p2cread, p2cwrite,
-                            c2pread, c2pwrite,
-                            errread, errwrite)
+        try:
+            self._execute_child(args, executable, preexec_fn, close_fds,
+                                cwd, env, universal_newlines,
+                                startupinfo, creationflags, shell,
+                                p2cread, p2cwrite,
+                                c2pread, c2pwrite,
+                                errread, errwrite)
+        except Exception:
+            # Preserve original exception in case os.close raises.
+            exc_type, exc_value, exc_trace = sys.exc_info()
+
+            to_close = []
+            # Only close the pipes we created.
+            if stdin == PIPE:
+                to_close.extend((p2cread, p2cwrite))
+            if stdout == PIPE:
+                to_close.extend((c2pread, c2pwrite))
+            if stderr == PIPE:
+                to_close.extend((errread, errwrite))
+
+            for fd in to_close:
+                try:
+                    os.close(fd)
+                except EnvironmentError:
+                    pass
+
+            raise exc_type, exc_value, exc_trace

I think I have three questions:

1) Is it true it should principally possible to use subprocess.Popen, with PIPE for stdin, stdout and stderr, in a threaded environment?

2) How do I prevent the file descriptors for stdin, stdout and stderr to be closed when the Popen() fails in one of the threads?

3) Am I doing something wrong here?

like image 744
janwijbrand Avatar asked Aug 26 '13 08:08

janwijbrand


People also ask

Does Popen need to be closed?

Popen do we need to close the connection or subprocess automatically closes the connection? Usually, the examples in the official documentation are complete. There the connection is not closed. So you do not need to close most probably.

What does Popen communicate do?

. communicate() writes input (there is no input in this case so it just closes subprocess' stdin to indicate to the subprocess that there is no more input), reads all output, and waits for the subprocess to exit.

What is Popen in subprocess?

The subprocess module defines one class, Popen and a few wrapper functions that use that class. The constructor for Popen takes arguments to set up the new process so the parent can communicate with it via pipes. It provides all of the functionality of the other modules and functions it replaces, and more.

What is Popen in Python?

Python method popen() opens a pipe to or from command. The return value is an open file object connected to the pipe, which can be read or written depending on whether mode is 'r' (default) or 'w'. The bufsize argument has the same meaning as in open() function.


1 Answers

I would like to answer your questions with:

  1. Yes.
  2. You shouldn't have to.
  3. No.

The error occurs indeed in Python 2.7.4 as well.

I think this is a bug in the library code. If you add a lock in your program and make sure that the two calls to subprocess.Popen are executed atomically, the error does not occur.

@@ -1,32 +1,40 @@
 import time
 import threading
 import subprocess

+lock = threading.Lock()
+
 def subprocesscall():
+    lock.acquire()
     p = subprocess.Popen(
         ['ls', '-l'],
         stdin=subprocess.PIPE,
         stdout=subprocess.PIPE,
         stderr=subprocess.PIPE,
         )
+    lock.release()
     time.sleep(2) # simulate the Popen call takes some time to complete.
     out, err = p.communicate()
     print 'succeeding command in thread:', threading.current_thread().ident

 def failingsubprocesscall():
     try:
+        lock.acquire()
         p = subprocess.Popen(
             ['thiscommandsurelydoesnotexist'],
             stdin=subprocess.PIPE,
             stdout=subprocess.PIPE,
             stderr=subprocess.PIPE,
             )
     except Exception as e:
         print 'failing command:', e, 'in thread:', threading.current_thread().ident
+    finally:
+        lock.release()
+

 print 'main thread is:', threading.current_thread().ident

 subprocesscall_thread = threading.Thread(target=subprocesscall)
 subprocesscall_thread.start()
 failingsubprocesscall()
 subprocesscall_thread.join()

This means that it is most probably due to some data race in the implementation of Popen. I will risk a guess: the bug may be in the implementation of pipe_cloexec, called by _get_handles, which (in 2.7.4) is:

def pipe_cloexec(self):
    """Create a pipe with FDs set CLOEXEC."""
    # Pipes' FDs are set CLOEXEC by default because we don't want them
    # to be inherited by other subprocesses: the CLOEXEC flag is removed
    # from the child's FDs by _dup2(), between fork() and exec().
    # This is not atomic: we would need the pipe2() syscall for that.
    r, w = os.pipe()
    self._set_cloexec_flag(r)
    self._set_cloexec_flag(w)
    return r, w

and the comment warns explicitly about it not being atomic... This definitely causes a data race but, without experimentation, I don't know if it's what causes the problem.

like image 81
nickie Avatar answered Oct 22 '22 04:10

nickie