Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I close a Python 2.5.2 Popen subprocess once I have the data I need?

Tags:

python

pipe

popen

I am running the following version of Python:

$ /usr/bin/env python --version                                                                                                                                                            
Python 2.5.2                                    

I am running the following Python code to write data from a child subprocess to standard output, and reading that into a Python variable called metadata:

# Extract metadata (snippet from extractMetadata.py)
inFileAsGzip = "%s.gz" % inFile                                                                                                                                                                                                            
if os.path.exists(inFileAsGzip):                                                                                                                                                                                                           
    os.remove(inFileAsGzip)                                                                                                                                                                                                                
os.symlink(inFile, inFileAsGzip)                                                                                                                                                                                                           
extractMetadataCommand = "bgzip -c -d -b 0 -s %s %s" % (metadataRequiredFileSize, inFileAsGzip)                                                                                                                                            
metadataPipes = subprocess.Popen(extractMetadataCommand, stdin=None, stdout=subprocess.PIPE, shell=True, close_fds=True)                                                                                                      
metadata = metadataPipes.communicate()[0]                                                                                                                                                                                                                                                                                                                                                                                                          
metadataPipes.stdout.close()                                                                                                                                                                                                             
os.remove(inFileAsGzip) 
print metadata

The use case is as follows, to pull the first ten lines of standard output from the aforementioned code snippet:

$ extractMetadata.py | head

The error will appear if I pipe into head, awk, grep, etc.

The script ends with the following error:

close failed: [Errno 32] Broken pipe

I would have thought closing the pipes would be sufficient, but obviously that's not the case.

like image 613
Alex Reynolds Avatar asked Oct 05 '10 05:10

Alex Reynolds


People also ask

Does subprocess Popen need to be closed?

There the connection is not closed. So you do not need to close most probably. unrelated: you could use stdin=open('test.

How do you wait for Popen to finish?

A Popen object has a . wait() method exactly defined for this: to wait for the completion of a given subprocess (and, besides, for retuning its exit status). If you use this method, you'll prevent that the process zombies are lying around for too long. (Alternatively, you can use subprocess.

Does Python wait for subprocess run to finish?

Most of your interaction with the Python subprocess module will be via the run() function. This blocking function will start a process and wait until the new process exits before moving on.


1 Answers

Hmmm. I've seen some "Broken pipe" strangeness with subprocess + gzip before. I never did figure out exactly why it was happening but by changing my implementation approach, I was able to avoid the problem. It looks like you're just trying to use a backend gzip process to decompress a file (probably because Python's builtin module is horrendously slow... no idea why but it definitely is).

Rather than using communicate() you can, instead, treat the process as a fully asynchronous backend and just read it's output as it arrives. When the process dies, the subprocess module will take care of cleaning things up for you. The following snippit should provide the same basic functionality without any broken pipe issues.

import subprocess

gz_proc = subprocess.Popen(['gzip', '-c', '-d', 'test.gz'], stdout=subprocess.PIPE)

l = list()
while True:
    dat = gz_proc.stdout.read(4096)
    if not d:
        break
    l.append(d)

file_data = ''.join(l)
like image 145
Rakis Avatar answered Oct 18 '22 08:10

Rakis