Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is subprocess.Popen not waiting until the child process terminates?

Tags:

python

mysql

I'm having a problem with Python's subprocess.Popen method.

Here's a test script which demonstrates the problem. It's being run on a Linux box.

#!/usr/bin/env python
import subprocess
import time

def run(cmd):
  p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
  return p

### START MAIN
# copy some rows from a source table to a destination table
# note that the destination table is empty when this script is run
cmd = 'mysql -u ve --skip-column-names --batch --execute="insert into destination (select * from source limit 100000)" test'
run(cmd)

# check to see how many rows exist in the destination table
cmd = 'mysql -u ve --skip-column-names --batch --execute="select count(*) from destination" test'
process = run(cmd)
count = (int(process.communicate()[0][:-1]))

# if subprocess.Popen() waited for the child to terminate than count should be
# greater than 0
if count > 0:
  print "success: " + str(count)
else:
  print "failure: " + str(count)
  time.sleep(5)

  # find out how many rows exists in the destination table after sleeping
  process = run(cmd)
  count = (int(process.communicate()[0][:-1]))
  print "after sleeping the count is " + str(count)

Usually the output from this script is:

success: 100000

but sometimes it's

failure: 0
after sleeping the count is 100000

Note that in the failure case, the select immediately after the insert shows 0 rows but after sleeping for 5 seconds a second select correctly shows a row count of 100000. My conclusion is that one of the following is true:

  1. subprocess.Popen is not waiting for the child thread to terminate - This seems to contradict the documentation
  2. the mysql insert is not atomic - my understanding of mysql seems to indicate insert is atomic
  3. the select is not seeing the correct row count right away - according to a friend who knows mysql better than I do this should not happen either

What am I missing?

FYI, I'm aware that this is a hacky way of interacting with mysql from Python and MySQLdb would likely not have this problem but I'm curious as to why this method does not work.

like image 284
Drew Sherman Avatar asked Oct 09 '09 00:10

Drew Sherman


3 Answers

subprocess.Popen, when instantiated, runs the program. It does not, however, wait for it -- it fires it off in the background as if you'd typed cmd & in a shell. So, in the code above, you've essentially defined a race condition -- if the inserts can finish in time, it will appear normal, but if not you get the unexpected output. You are not waiting for your first run()'d PID to finish, you are simply returning its Popen instance and continuing.

I'm not sure how this behavior contradicts the documentation, because there's some very clear methods on Popen that seem to indicate it is not waited for, like:

Popen.wait()
  Wait for child process to terminate. Set and return returncode attribute.

I do agree, however, that the documentation for this module could be improved.

To wait for the program to finish, I'd recommend using subprocess's convenience method, subprocess.call, or using communicate on a Popen object (for the case when you need stdout). You are already doing this for your second call.

### START MAIN
# copy some rows from a source table to a destination table
# note that the destination table is empty when this script is run
cmd = 'mysql -u ve --skip-column-names --batch --execute="insert into destination (select * from source limit 100000)" test'
subprocess.call(cmd)

# check to see how many rows exist in the destination table
cmd = 'mysql -u ve --skip-column-names --batch --execute="select count(*) from destination" test'
process = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
try: count = (int(process.communicate()[0][:-1]))
except: count = 0

Additionally, in most cases, you do not need to run the command in a shell. This is one of those cases, but you'll have to rewrite your command like a sequence. Doing it that way also allows you to avoid traditional shell injection and worry less about quoting, like so:

prog = ["mysql", "-u", "ve", "--execute", 'insert into foo values ("snargle", 2)']
subprocess.call(prog)

This will even work, and will not inject as you'd expect:

prog = ["printf", "%s", "<", "/etc/passwd"]
subprocess.call(prog)

Try it interactively. You avoid the possibilities of shell injection, particularly if you're accepting user input. I suspect you're using the less-awesome string method of communicating with subprocess because you ran into trouble getting the sequences to work :^)

like image 131
Jed Smith Avatar answered Oct 30 '22 10:10

Jed Smith


If you don't absolutely need to use subprocess and popen, it's usually simpler to use os.system. For example, for quick scripts I often do something like this:

import os
run = os.system #convenience alias
result = run('mysql -u ve --execute="select * from wherever" test')

Unlike popen, os.system DOES wait for your process to return before moving on to the next stage of your script.

More info on it in the docs: http://docs.python.org/library/os.html#os.system

like image 36
Paul McMillan Avatar answered Oct 30 '22 09:10

Paul McMillan


Dude, why did you think subprocess.Popen returned an object with a wait method, unless it was because the waiting was NOT implicit, intrinsic, immediate, and inevitable, as you appear to surmise...?! The most common reason to spawn a subprocess is NOT to immediately wait for it to finish, but rather to let it proceed (e.g. on another core, or at worst by time-slicing -- that's the operating system's -- and hardware's -- lookout) at the same time as the parent process continues; when the parent process needs to wait for the subprocess to be finished, it will obviously call wait on the object returned by the original subprocess.Process call.

like image 3
Alex Martelli Avatar answered Oct 30 '22 10:10

Alex Martelli