I am running the following version of Python:
$ /usr/bin/env python --version
Python 2.5.2
I am running the following Python code to write data from a child subprocess to standard output, and reading that into a Python variable called metadata
:
# Extract metadata (snippet from extractMetadata.py)
inFileAsGzip = "%s.gz" % inFile
if os.path.exists(inFileAsGzip):
os.remove(inFileAsGzip)
os.symlink(inFile, inFileAsGzip)
extractMetadataCommand = "bgzip -c -d -b 0 -s %s %s" % (metadataRequiredFileSize, inFileAsGzip)
metadataPipes = subprocess.Popen(extractMetadataCommand, stdin=None, stdout=subprocess.PIPE, shell=True, close_fds=True)
metadata = metadataPipes.communicate()[0]
metadataPipes.stdout.close()
os.remove(inFileAsGzip)
print metadata
The use case is as follows, to pull the first ten lines of standard output from the aforementioned code snippet:
$ extractMetadata.py | head
The error will appear if I pipe into head, awk, grep, etc.
The script ends with the following error:
close failed: [Errno 32] Broken pipe
I would have thought closing the pipes would be sufficient, but obviously that's not the case.
There the connection is not closed. So you do not need to close most probably. unrelated: you could use stdin=open('test.
A Popen object has a . wait() method exactly defined for this: to wait for the completion of a given subprocess (and, besides, for retuning its exit status). If you use this method, you'll prevent that the process zombies are lying around for too long. (Alternatively, you can use subprocess.
Most of your interaction with the Python subprocess module will be via the run() function. This blocking function will start a process and wait until the new process exits before moving on.
Hmmm. I've seen some "Broken pipe" strangeness with subprocess + gzip before. I never did figure out exactly why it was happening but by changing my implementation approach, I was able to avoid the problem. It looks like you're just trying to use a backend gzip process to decompress a file (probably because Python's builtin module is horrendously slow... no idea why but it definitely is).
Rather than using communicate()
you can, instead, treat the process as a fully asynchronous backend and just read it's output as it arrives. When the process dies, the subprocess module will take care of cleaning things up for you. The following snippit should provide the same basic functionality without any broken pipe issues.
import subprocess
gz_proc = subprocess.Popen(['gzip', '-c', '-d', 'test.gz'], stdout=subprocess.PIPE)
l = list()
while True:
dat = gz_proc.stdout.read(4096)
if not d:
break
l.append(d)
file_data = ''.join(l)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With