I'm trying to pass information into a program that doesn't accept input from stdin. To do this, I'm using the /dev/stdin as an argument and then trying to pipe in my input. I've noticed that if I do this with a pipe character:
[pkerp@comp ernwin]$ cat fess/structures/168d.pdb | MC-Annotate /dev/stdin
I get no output. If, however, I do the same thing using the left caret character, it works fine:
[pkerp@plastilin ernwin]$ MC-Annotate /dev/stdin < fess/structures/168d.pdb
Residue conformations -------------------------------------------
A1 : G C3p_endo anti
A2 : C C3p_endo anti
A3 : G C3p_endo anti
My question is, what is difference between these two operations and why do they give a different outcome? As a bonus question, is there a proper term for specifying input using the '<' symbol?
Update:
My current best guess is that something internal to the program being run makes use of seeking within the file. The answers below seem to suggest that it has something to do with the file pointers but running the following little test program:
#include <stdio.h>
int main(int argc, char *argv[])
{
FILE *f = fopen(argv[1], "r");
char line[128];
printf("argv[1]: %s f: %d\n", argv[1], fileno(f));
while (fgets(line, sizeof(line), f)) {
printf("line: %s\n", line);
}
printf("rewinding\n");
fseek(f, 0, SEEK_SET);
while (fgets(line, sizeof(line), f)) {
printf("line: %s\n", line);
}
fclose(f);
}
indicates that everything occurs identically up until the fseek
function call:
[pete@kat tmp]$ cat temp | ./a.out /dev/stdin
argv[1]: /dev/stdin f: 3
line: abcd
rewinding
===================
[pete@kat tmp]$ ./a.out /dev/stdin < temp
argv[1]: /dev/stdin f: 3
line: abcd
rewinding
line: abcd
Using process substitution as Christopher Neylan suggested leads to the program above hanging without even reading the input, which also seems a little strange.
[pete@kat tmp]$ ./a.out /dev/stdin <( cat temp )
argv[1]: /dev/stdin f: 3
Looking at the strace output confirms my suspicion that a seek operation is attempted which fails in the pipe version:
_llseek(3, 0, 0xffffffffffd7c7c0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
And succeeds in the redirect version.
_llseek(3, 0, [0], SEEK_CUR) = 0
The moral of story: don't haphazardly try to replace an argument with /dev/stdin
and try to pipe to it. It might work, but it just as well might not.
There should be no functional difference between those two commands. Indeed, I cannot recreate what you're seeing:
#! /usr/bin/perl
# test.pl
# this is a test Perl script that will read from a filename passed on the command line, and print what it reads.
use strict;
use warnings;
print $ARGV[0], " -> ", readlink( $ARGV[0] ), " -> ", readlink( readlink($ARGV[0]) ), "\n";
open( my $fh, "<", $ARGV[0] ) or die "$!";
while( defined(my $line = <$fh>) ){
print "READ: $line";
}
close( $fh );
Running this the three ways:
([email protected]: tmp)$ cat input
a
b
c
d
([email protected]: tmp)$ ./test.pl /dev/stdin
/dev/stdin -> /proc/self/fd/0 -> /dev/pts/0
this is me typing into the terminal
READ: this is me typing into the terminal
([email protected]: tmp)$ cat input | ./test.pl /dev/stdin
/dev/stdin -> /proc/self/fd/0 -> pipe:[1708285]
READ: a
READ: b
READ: c
READ: d
([email protected]: tmp)$ ./test.pl /dev/stdin < input
/dev/stdin -> /proc/self/fd/0 -> /tmp/input
READ: a
READ: b
READ: c
READ: d
First note what /dev/stdin
is:
([email protected]: tmp)$ ls -l /dev/stdin
lrwxrwxrwx 1 root root 15 Apr 21 15:39 /dev/stdin -> /proc/self/fd/0
([email protected]: tmp)$ ls -l /proc/self
lrwxrwxrwx 1 root root 0 May 10 09:44 /proc/self -> 27565
It's always a symlink to /proc/self/fd/0
. /proc/self
is itself a special link to the directory under /proc
for the current process. So /dev/stdin
will always point to fd 0 of the current process. So when you run MC-Annotate
(or, in my examples, test.pl
), the file /dev/stdin
will resolve to /proc/$pid/fd/0
, for whatever the process ID of MC-Annotate
is. This is just a result of how the symlink for /dev/stdin
works.
So as you can see above in my example, when you use a pipe (|
), /proc/self/fd/0
will point to the read end of the pipe from cat
set up by the shell. When you use a redirection (<
), /proc/self/fd/0
will point directly to the input file, as set up by the shell.
As to why you're seeing this odd behavior--I'd guess that MC-Annotate
is doing some checks on the filetype before opening it and it's seeing that /dev/stdin is pointing to a named pipe instead of a regular file, and is bailing out. You could confirm this by either reading the source-code for MC-Annotate
or using the strace
command to watch what's happening internally.
Note that both of these methods are a bit round-about in Bash. The accepted way to get the output of a process into a program that will only open a filename is to use process substitution:
$ MC-Annotate <(cat fess/structures/168d.pdb)
The <(...)
construct returns a file descriptor to the read-end of a pipe that's coming from whatever the ...
is:
([email protected]: tmp)$ echo <(true | grep example | cat)
/dev/fd/63
The problem lies in the order in which files are opened for reading.
/dev/stdin
is not a real file; it's a symlink to the file which the current process uses as standard input. In a typical shell, it is linked to the terminal, and inherited by any process started by the shell. Keep in mind that MC-Annotate
will only read from the file provided as an argument.
In the pipe example, /dev/stdin
is a symlink to the file which MC-Annotate
inherits as standard input: the terminal. It probably opens this file on a new descriptor (let's say 3, but it could be any value greater than 2). The pipe connects the output of cat
to MC-Annotate's
standard input (file descriptor 0), which MC-Annotate
continues to ignore in favor of the file it opened directly.
In the redirection example, the shell connects fess/structures/168d.pdb
directly to file descriptor 0 before MC-Annotate
is run. When MC-Annotate
starts up, it again tries to open /dev/stdin
, which this time points to fess/structures/168d.pdb
instead of the terminal.
So the answer lies in which file /dev/stdin
is a link to in the process that executes MC-Annotate
; shell redirections are set up before the process starts; pipelines after the process starts.
Does this work?
cat fess/structures/168d.pdb | MC-Annotate <( cat /dev/stdin )
A similar command
echo foo | cat <( cat /dev/stdin )
seems to work, but I won't claim the situations are identical.
[ UPDATE: does not work. /dev/stdin
is still a link to the terminal, not the pipeline.]
This might provide a work-around. Now, MC-Annotate
inherits its standard input from the subshell, not the current shell, and the subshell has output from cat
as its standard input, not the terminal.
cat fess/structures/168d.pdb | ( MC-Annotate /dev/stdin )
It think a simple command group will work as well:
cat fess/structures/168d.pdb | { MC-Annotate /dev/stdin; }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With