How do I execute the following shell command using the Python <code>subprocess</code> module? <pre class="prettyprint"><code>echo "input data" | awk -f script.awk | sort > outfile.txt </code></pre> The input data will come from a string, so I don't actually need <code>echo</code>. I've got this far, can anyone explain how I get it to pipe through <code>sort</code> too? <pre class="prettyprint"><code>p_awk = subprocess.Popen(["awk","-f","script.awk"], stdin=subprocess.PIPE, stdout=file("outfile.txt", "w")) p_awk.communicate( "input data" ) </code></pre> UPDATE: Note that while the accepted answer below doesn't actually answer the question as asked, I believe S.Lott is right and it's better to avoid having to solve that problem in the first place!

You'd be a little happier with the following. <pre class="prettyprint"><code>import subprocess awk_sort = subprocess.Popen( "awk -f script.awk | sort > outfile.txt", stdin=subprocess.PIPE, shell=True ) awk_sort.communicate( b"input data\n" ) </code></pre> Delegate part of the work to the shell. Let it connect two processes with a pipeline. You'd be a lot happier rewriting 'script.awk' into Python, eliminating awk and the pipeline. Edit. Some of the reasons for suggesting that awk isn't helping. [There are too many reasons to respond via comments.] <ol> <li>Awk is adding a step of no significant value. There's nothing unique about awk's processing that Python doesn't handle.</li> <li>The pipelining from awk to sort, for large sets of data, may improve elapsed processing time. For short sets of data, it has no significant benefit. A quick measurement of <code>awk >file ; sort file</code> and <code>awk | sort</code> will reveal of concurrency helps. With sort, it rarely helps because sort is not a once-through filter.</li> <li>The simplicity of "Python to sort" processing (instead of "Python to awk to sort") prevents the exact kind of questions being asked here.</li> <li>Python -- while wordier than awk -- is also explicit where awk has certain implicit rules that are opaque to newbies, and confusing to non-specialists.</li> <li>Awk (like the shell script itself) adds Yet Another Programming language. If all of this can be done in one language (Python), eliminating the shell and the awk programming eliminates two programming languages, allowing someone to focus on the value-producing parts of the task.</li> </ol> Bottom line: awk can't add significant value. In this case, awk is a net cost; it added enough complexity that it was necessary to ask this question. Removing awk will be a net gain. Sidebar Why building a pipeline (<code>a | b</code>) is so hard. When the shell is confronted with <code>a | b</code> it has to do the following. <ol> <li>Fork a child process of the original shell. This will eventually become b.</li> <li>Build an os pipe. (not a Python subprocess.PIPE) but call <code>os.pipe()</code> which returns two new file descriptors that are connected via common buffer. At this point the process has stdin, stdout, stderr from its parent, plus a file that will be "a's stdout" and "b's stdin". </li> <li>Fork a child. The child replaces its stdout with the new a's stdout. Exec the <code>a</code> process.</li> <li>The b child closes replaces its stdin with the new b's stdin. Exec the <code>b</code> process.</li> <li>The b child waits for a to complete. </li> <li>The parent is waiting for b to complete.</li> </ol> I think that the above can be used recursively to spawn <code>a | b | c</code>, but you have to implicitly parenthesize long pipelines, treating them as if they're <code>a | (b | c)</code>. Since Python has <code>os.pipe()</code>, <code>os.exec()</code> and <code>os.fork()</code>, and you can replace <code>sys.stdin</code> and <code>sys.stdout</code>, there's a way to do the above in pure Python. Indeed, you may be able to work out some shortcuts using <code>os.pipe()</code> and <code>subprocess.Popen</code>. However, it's easier to delegate that operation to the shell.

How do I use subprocess.Popen to connect multiple processes by pipes?

Tags:

python

subprocess

pipe

How do I execute the following shell command using the Python subprocess module?

echo "input data" | awk -f script.awk | sort > outfile.txt

The input data will come from a string, so I don't actually need echo. I've got this far, can anyone explain how I get it to pipe through sort too?

p_awk = subprocess.Popen(["awk","-f","script.awk"],                           stdin=subprocess.PIPE,                           stdout=file("outfile.txt", "w")) p_awk.communicate( "input data" )

UPDATE: Note that while the accepted answer below doesn't actually answer the question as asked, I believe S.Lott is right and it's better to avoid having to solve that problem in the first place!

774

asked Nov 17 '08 12:11

Tom

2 Answers

You'd be a little happier with the following.

import subprocess  awk_sort = subprocess.Popen( "awk -f script.awk | sort > outfile.txt",     stdin=subprocess.PIPE, shell=True ) awk_sort.communicate( b"input data\n" )

Delegate part of the work to the shell. Let it connect two processes with a pipeline.

You'd be a lot happier rewriting 'script.awk' into Python, eliminating awk and the pipeline.

Edit. Some of the reasons for suggesting that awk isn't helping.

[There are too many reasons to respond via comments.]

Awk is adding a step of no significant value. There's nothing unique about awk's processing that Python doesn't handle.
The pipelining from awk to sort, for large sets of data, may improve elapsed processing time. For short sets of data, it has no significant benefit. A quick measurement of awk >file ; sort file and awk | sort will reveal of concurrency helps. With sort, it rarely helps because sort is not a once-through filter.
The simplicity of "Python to sort" processing (instead of "Python to awk to sort") prevents the exact kind of questions being asked here.
Python -- while wordier than awk -- is also explicit where awk has certain implicit rules that are opaque to newbies, and confusing to non-specialists.
Awk (like the shell script itself) adds Yet Another Programming language. If all of this can be done in one language (Python), eliminating the shell and the awk programming eliminates two programming languages, allowing someone to focus on the value-producing parts of the task.

Bottom line: awk can't add significant value. In this case, awk is a net cost; it added enough complexity that it was necessary to ask this question. Removing awk will be a net gain.

Sidebar Why building a pipeline (a | b) is so hard.

When the shell is confronted with a | b it has to do the following.

Fork a child process of the original shell. This will eventually become b.
Build an os pipe. (not a Python subprocess.PIPE) but call os.pipe() which returns two new file descriptors that are connected via common buffer. At this point the process has stdin, stdout, stderr from its parent, plus a file that will be "a's stdout" and "b's stdin".
Fork a child. The child replaces its stdout with the new a's stdout. Exec the a process.
The b child closes replaces its stdin with the new b's stdin. Exec the b process.
The b child waits for a to complete.
The parent is waiting for b to complete.

I think that the above can be used recursively to spawn a | b | c, but you have to implicitly parenthesize long pipelines, treating them as if they're a | (b | c).

Since Python has os.pipe(), os.exec() and os.fork(), and you can replace sys.stdin and sys.stdout, there's a way to do the above in pure Python. Indeed, you may be able to work out some shortcuts using os.pipe() and subprocess.Popen.

However, it's easier to delegate that operation to the shell.

179

answered Oct 24 '22 01:10

S.Lott

import subprocess  some_string = b'input_data'  sort_out = open('outfile.txt', 'wb', 0) sort_in = subprocess.Popen('sort', stdin=subprocess.PIPE, stdout=sort_out).stdin subprocess.Popen(['awk', '-f', 'script.awk'], stdout=sort_in,                   stdin=subprocess.PIPE).communicate(some_string)

answered Oct 24 '22 03:10

Cristian

Related questions
                            
                                Django migration with uuid field generates duplicated values
                            
                                Keras model.summary() result - Understanding the # of Parameters
                            
                                How to see the real SQL query in Python cursor.execute using pyodbc and MS-Access
                            
                                How would I cross-reference a function generated by autodoc in Sphinx?
                            
                                How can you find unused functions in Python code?
                            
                                Adding another suffix to a path that already has a suffix with pathlib
                            
                                How to remove trailing whitespace in code, using another script?
                            
                                How can I prevent the TypeError: list indices must be integers, not tuple when copying a python list to a numpy array?
                            
                                Python type hinting with exceptions
                            
                                stopping setup.py from installing as egg
                            
                                Get webpage contents with Python?
                            
                                How can I use seaborn without changing the matplotlib defaults?
                            
                                Nested f-strings
                            
                                Rearrange columns of numpy 2D array
                            
                                Set legend symbol opacity with matplotlib?
                            
                                Python Sound ("Bell")
                            
                                Send log messages from all celery tasks to a single file
                            
                                python copy files by wildcards
                            
                                How to add if condition in a TensorFlow graph?
                            
                                logging remove / inspect / modify handlers configured by fileConfig()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With