Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

stdin behaves differently when piped and when redirected

Tags:

bash

stdin

I'm trying to pass information into a program that doesn't accept input from stdin. To do this, I'm using the /dev/stdin as an argument and then trying to pipe in my input. I've noticed that if I do this with a pipe character:

[pkerp@comp ernwin]$ cat fess/structures/168d.pdb | MC-Annotate /dev/stdin

I get no output. If, however, I do the same thing using the left caret character, it works fine:

[pkerp@plastilin ernwin]$ MC-Annotate /dev/stdin < fess/structures/168d.pdb
Residue conformations -------------------------------------------
A1 : G C3p_endo anti
A2 : C C3p_endo anti
A3 : G C3p_endo anti

My question is, what is difference between these two operations and why do they give a different outcome? As a bonus question, is there a proper term for specifying input using the '<' symbol?

Update:

My current best guess is that something internal to the program being run makes use of seeking within the file. The answers below seem to suggest that it has something to do with the file pointers but running the following little test program:

#include <stdio.h>

int main(int argc, char *argv[])
{   
    FILE *f = fopen(argv[1], "r");
    char line[128];

    printf("argv[1]: %s f: %d\n", argv[1], fileno(f));

    while (fgets(line, sizeof(line), f)) {
    printf("line: %s\n", line);
    }

    printf("rewinding\n");
    fseek(f, 0, SEEK_SET);

    while (fgets(line, sizeof(line), f)) {
    printf("line: %s\n", line);
    }
    fclose(f);
}

indicates that everything occurs identically up until the fseek function call:

[pete@kat tmp]$ cat temp | ./a.out /dev/stdin
argv[1]: /dev/stdin f: 3
line: abcd

rewinding
===================
[pete@kat tmp]$ ./a.out /dev/stdin < temp
argv[1]: /dev/stdin f: 3
line: abcd

rewinding
line: abcd

Using process substitution as Christopher Neylan suggested leads to the program above hanging without even reading the input, which also seems a little strange.

[pete@kat tmp]$ ./a.out /dev/stdin <( cat temp )
argv[1]: /dev/stdin f: 3

Looking at the strace output confirms my suspicion that a seek operation is attempted which fails in the pipe version:

_llseek(3, 0, 0xffffffffffd7c7c0, SEEK_CUR) = -1 ESPIPE (Illegal seek)

And succeeds in the redirect version.

_llseek(3, 0, [0], SEEK_CUR)            = 0 

The moral of story: don't haphazardly try to replace an argument with /dev/stdin and try to pipe to it. It might work, but it just as well might not.

like image 412
juniper- Avatar asked May 10 '13 12:05

juniper-


2 Answers

There should be no functional difference between those two commands. Indeed, I cannot recreate what you're seeing:

#! /usr/bin/perl
# test.pl
# this is a test Perl script that will read from a filename passed on the command line, and print what it reads.

use strict;
use warnings;

print $ARGV[0], " -> ", readlink( $ARGV[0] ), " -> ", readlink( readlink($ARGV[0]) ), "\n";
open( my $fh, "<", $ARGV[0] ) or die "$!";
while( defined(my $line = <$fh>) ){
        print "READ: $line";
}
close( $fh );

Running this the three ways:

([email protected]: tmp)$ cat input
a
b
c
d

([email protected]: tmp)$ ./test.pl /dev/stdin
/dev/stdin -> /proc/self/fd/0 -> /dev/pts/0
this is me typing into the terminal
READ: this is me typing into the terminal

([email protected]: tmp)$ cat input | ./test.pl /dev/stdin
/dev/stdin -> /proc/self/fd/0 -> pipe:[1708285]
READ: a
READ: b
READ: c
READ: d

([email protected]: tmp)$ ./test.pl /dev/stdin < input
/dev/stdin -> /proc/self/fd/0 -> /tmp/input
READ: a
READ: b
READ: c
READ: d

First note what /dev/stdin is:

([email protected]: tmp)$ ls -l /dev/stdin
lrwxrwxrwx 1 root root 15 Apr 21 15:39 /dev/stdin -> /proc/self/fd/0

([email protected]: tmp)$ ls -l /proc/self
lrwxrwxrwx 1 root root 0 May 10 09:44 /proc/self -> 27565

It's always a symlink to /proc/self/fd/0. /proc/self is itself a special link to the directory under /proc for the current process. So /dev/stdin will always point to fd 0 of the current process. So when you run MC-Annotate (or, in my examples, test.pl), the file /dev/stdin will resolve to /proc/$pid/fd/0, for whatever the process ID of MC-Annotate is. This is just a result of how the symlink for /dev/stdin works.

So as you can see above in my example, when you use a pipe (|), /proc/self/fd/0 will point to the read end of the pipe from cat set up by the shell. When you use a redirection (<), /proc/self/fd/0 will point directly to the input file, as set up by the shell.

As to why you're seeing this odd behavior--I'd guess that MC-Annotate is doing some checks on the filetype before opening it and it's seeing that /dev/stdin is pointing to a named pipe instead of a regular file, and is bailing out. You could confirm this by either reading the source-code for MC-Annotate or using the strace command to watch what's happening internally.

Note that both of these methods are a bit round-about in Bash. The accepted way to get the output of a process into a program that will only open a filename is to use process substitution:

$ MC-Annotate <(cat fess/structures/168d.pdb)

The <(...) construct returns a file descriptor to the read-end of a pipe that's coming from whatever the ... is:

([email protected]: tmp)$ echo <(true | grep example | cat)
/dev/fd/63
like image 126
Christopher Neylan Avatar answered Oct 20 '22 00:10

Christopher Neylan


The problem lies in the order in which files are opened for reading.

/dev/stdin is not a real file; it's a symlink to the file which the current process uses as standard input. In a typical shell, it is linked to the terminal, and inherited by any process started by the shell. Keep in mind that MC-Annotate will only read from the file provided as an argument.

In the pipe example, /dev/stdin is a symlink to the file which MC-Annotate inherits as standard input: the terminal. It probably opens this file on a new descriptor (let's say 3, but it could be any value greater than 2). The pipe connects the output of cat to MC-Annotate's standard input (file descriptor 0), which MC-Annotate continues to ignore in favor of the file it opened directly.

In the redirection example, the shell connects fess/structures/168d.pdb directly to file descriptor 0 before MC-Annotate is run. When MC-Annotate starts up, it again tries to open /dev/stdin, which this time points to fess/structures/168d.pdb instead of the terminal.

So the answer lies in which file /dev/stdin is a link to in the process that executes MC-Annotate; shell redirections are set up before the process starts; pipelines after the process starts.

Does this work?

cat fess/structures/168d.pdb | MC-Annotate <( cat /dev/stdin )

A similar command

echo foo | cat <( cat /dev/stdin )

seems to work, but I won't claim the situations are identical.


[ UPDATE: does not work. /dev/stdin is still a link to the terminal, not the pipeline.]

This might provide a work-around. Now, MC-Annotate inherits its standard input from the subshell, not the current shell, and the subshell has output from cat as its standard input, not the terminal.

cat fess/structures/168d.pdb | ( MC-Annotate /dev/stdin )

It think a simple command group will work as well:

cat fess/structures/168d.pdb | { MC-Annotate /dev/stdin; }
like image 32
chepner Avatar answered Oct 20 '22 00:10

chepner