Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Equivalent of set -o pipefail in Python?

I have a handful of Python scripts each of which make heavy use of sorting, uniq-ing, counting, gzipping and gunzipping, and awking. As a first run through the code I've used subprocess.call with (yes I know of the security risks that's why I said it is a first pass) shell=True. I have a little helper function:

def do(command):
    start = datetime.now()
    return_code = call(command, shell=True)
    print 'Completed in', str(datetime.now() - start), 'ms, return code =', return_code
    if return_code != 0:
        print 'Failure: aborting with return code %d' % return_code
        sys.exit(return_code)

Scripts make use of this helper as in the following snippets:

do('gunzip -c %s | %s | sort -u | %s > %s' % (input, parse, flatten, output))
do("gunzip -c %s | grep 'en$' | cut -f1,2,4 -d\|| %s > %s" % (input, parse, output))
do('cat %s | %s | gzip -c > %s' % (input, dedupe, output))
do("awk -F ' ' '{print $%d,$%d}' %s | sort -u | %s | gzip -c > %s" % params)
do('gunzip -c %s | %s | gzip -c > %s' % (input, parse, output))
do('gunzip -c %s | %s > %s' % (input, parse, collection))
do('%s < %s >> %s' % (parse, supplement, collection))
do('cat %s %s | sort -k 2 | %s | gzip -c > %s' % (source,other_source,match,output)

And there are many more like these, some with even longer pipelines.

One issue I notice is that when a command early in a pipeline fails, the whole command will still succeed with exit status 0. In bash I fix this with

set -o pipefail

but I do not see how this can be done in Python. I suppose I could put in an explicit call to bash but that seems wrong. Is it?

In lieu of an answer to that specific question, I'd love to hear alternatives to implementing this kind of code in pure Python without resorting to shell=True. But when I attempt to use Popen and stdout=PIPE the code size blows up. There is something nice about writing pipelines on one line as a string, but if anyone knows an elegant multiline "proper and secure" way to do this in Python I would love to hear it!

An aside: none of these scripts ever take user input; they run batch jobs on a machine with a known shell which is why I actually ventured into the evil shell=True just to see how things would look. And they do look pretty easy to read and the code seems so concise! How does one remove the shell=True and run these long pipelines in raw Python while still getting the advantages of aborting the process if an early component fails?

like image 390
Ray Toal Avatar asked Feb 12 '14 23:02

Ray Toal


1 Answers

You can set the pipefail in the calls to system:

def do(command):
  start = datetime.now()
  return_code = call([ '/bin/bash', '-c', 'set -o pipefail; ' + command ])
  ...

Or, as @RayToal pointed out in a comment, use the -o option of the shell to set this flag: call([ '/bin/bash', '-o', 'pipefail', '-c', command ]).

like image 135
Alfe Avatar answered Nov 02 '22 04:11

Alfe