Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing Shell Pipeline [duplicate]

In the Python 2.7 documentation of subprocess module, I found the following snippets:

p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close()  # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]

Source : https://docs.python.org/2/library/subprocess.html#replacing-shell-pipeline

I don't understand this line : p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.

Here p1.stdout is being closed. How does it allow p1 to receive a SIGPIPE if p2 exits?

like image 372
Tamim Shahriar Avatar asked Jan 06 '15 10:01

Tamim Shahriar


2 Answers

The SIGPIPE signal is normally sent if a process attempts to write to a pipe from which no active process is looking. In the shell pipeline equivalent of your code snippet:

`dmesg | grep hda`

If the grep process for some reason terminates before dmesg is done writing output, dmesg will receive a SIGPIPE and terminate itself. This would be the expected behavior for UNIX/Linux processes (http://en.wikipedia.org/wiki/Unix_signal).

By contrast, in the Python implementation using subprocess, if p2 exits before p1 is done generating output, the SIGPIPE doesn't get sent because there is actually still a process looking at the pipe - the Python script itself (the one which created p1 and p2). More importantly, the script is looking at the pipe but not consuming its contents - the effect is that the pipe is held open indefinitely and p1 gets stuck in limbo.

Explicitly closing p1.stdout detaches the Python script from the pipe and makes it such that no process other than p2 is looking at the pipe - that way if p2 does end before p1, p1 properly gets the signal to end itself without anything artificially holding the pipe open.

Here is an alternatively worded explanation: http://www.enricozini.org/2009/debian/python-pipes/

like image 145
rchang Avatar answered Oct 31 '22 11:10

rchang


A hopefully more systematic explanation:

  • A pipe is an instance managed by the operating system. It has a single read end and a single write end.
  • Both ends can be opened by multiple processes. There is still only one pipe, though. That is, multiple processes can share the same pipe.
  • A process that has opened one of the ends holds a corresponding file handle. The process can actively close() it again! If a process exits, the operating system closes the corresponding file handle for you.
  • All involved processes can close() their file handle representing the read end of the pipe. Nothing wrong with that, this is a perfectly fine situation.
  • Now, if a process writes data to the write end of the pipe and the read end is not opened anymore (no process holds an open file handle for the read end), a POSIX-compliant operating system sends a SIGPIPE signal to the writing process for it to know that there is no reader anymore.

This is the standard mechanism by which the receiving program can implicitly tell the sending program that it has stopped reading. Have you ever wondered if

cat bigfile | head -n5

actually reads the entire bigfile? No, it does not, because cat retrieves a SIGPIPE signal as soon as head exits (after reading 5 lines from stdin). The important thing to appreciate: cat has been designed to actually respond to SIGPIPE (that is an important engineering decision ;)): it stops reading the file and exits. Other programs are designed to ignore SIGPIPE (on purpose, these handle this situation on their own -- this is common in networking applications).

If you keep the read end of the pipe open in your controlling process, you disable described mechanism. dmesg will not be able to notice that grep has exited.

However, your example actually is not a good one. grep hda will read the entire input. dmesg is the process that exits first.

like image 3
Dr. Jan-Philip Gehrcke Avatar answered Oct 31 '22 12:10

Dr. Jan-Philip Gehrcke