Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python subprocess losing 10% of a program's stdout

I have a program that needs to be called as a subprocess with python. The program has been written in java. yeah, i know...

anyway, I need to capture all of the output from said program.

Unfortunately, when I call subprocess.popen2 or subprocess.Popen with communicate[0], I'm losing around 10% of the output data when I'm using a subprocess.PIPE assigned to stdout AND when i'm using a file descriptor (the return from an open) assigned to stdout.

The documentation in subprocess is pretty explicit that using subprocess.PIPE is volatile if you're trying to capture all of the output from a child process.

I'm currently using pexpect to dump the ouput into a tmp file but that's taking forever for obvious reasons.

I'd like to keep all the data in memory to avoid disk writes.

any recommendations are welcome! thanks!

import subprocess

cmd = 'java -Xmx2048m -cp "/home/usr/javalibs/class:/home/usr/javalibs/libs/dependency.jar" --data data --input input" 

# doesn't get all the data
#
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, shell=True)
output = p.communicate()[0]

OR
# doesn't get all the data
#
fd = open("outputfile",'w')
p = subprocess.Popen(cmd, stdout=fd, shell=True)
p.communicate()
fd.close() # tried to use fd.flush() too.

# also tried
# p.wait() instead of p.communicate(), but wait doesn't really wait for the java program to finish running - it doesn't block

OR
# also fails to get all the data
#
import popen2
(rstdout, rstdin) = popen2.popen2(cmd)

Expected output is a series of ascii lines (a couple thousand). the lines contain a number and an end of line character

0\n
1\n
4\n
0\n
...
like image 314
ct_ Avatar asked Oct 24 '25 18:10

ct_


2 Answers

I had used subprocess with much larger output on stdout but haven't seen such problem. It's hard to conclude what's the root cause from what you've shown. I would check following:

Since p.wait() didn't work for you. It could be the case that when you reading your PIPE your java program still busy printing the last 10%. Get p.wait() straight first:

  • Insert a large enough wait (say 30 secs) before you read the PIPE, does your 10% shows up?
  • It's doubtful that p.wait() doesn't block on your java program. Does your java program further subprocessing other program?
  • check the return value of p.wait(). Did your java program terminated normally?

If the problem not lays in your concurrency model, then check if you are printing correctly in your java program:

  • What function you used in your java program to print to stdout? Does it prone to or ignoring IOException?
  • Did you flush the stream correctly? The last 10% could be in your buffer without proper flushing when your java program terminates.
like image 185
xbtsw Avatar answered Oct 26 '25 08:10

xbtsw


It must be something related to the process you are actually calling. You can verify this by doing a simple test with another python script that echos out lines:

out.py

import sys

for i in xrange(5000):
    print "%d\n" % i

sys.exit(0)

test.py

import subprocess

cmd = "python out.py"
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, shell=True)
output = p.communicate()[0]

print output

So you can verify that its not the size of the data that is the issue, but rather the communication with the process you are calling.

You should also confirm the version of python you are running, as I have read about past issues concerning the internal buffer of Popen (but using a separate file handle as you have suggested normally fixed that for me).

It would be a buffer issue if the subprocess call was hanging indefinitely. But if the process is completing, just lacking lines, then Popen is doing its job.

like image 36
jdi Avatar answered Oct 26 '25 07:10

jdi



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!