Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Perl, how can I read from multiple filehandles in one loop?

Tags:

file-io

perl

I was wondering how I could implement this in Perl:

while ( not the end of the files )
    $var1 = read a line from file 1
    $var2 = read a line from file 2
    # operate on variables
end while

I'm not sure how to read one line at a time from two files in one while loop.

like image 404
Ryan Avatar asked Jul 12 '12 15:07

Ryan


3 Answers

Seems like you wrote your answer yourself, almost. Just check for eof for both file handles, like so:

while (not eof $fh1 and not eof $fh2) {
    my $var1 = <$fh1>;
    my $var2 = <$fh2>;
    # do stuff
}

More reading:

  • perldoc -f open
  • perldoc -f eof
  • perldoc perlopentut
like image 165
TLP Avatar answered Nov 15 '22 10:11

TLP


Note: I expanded my answer in response to @zostay and @jm666's comments.

The first step in coming up with an efficient, clear, and concise answer to this question starts with the idea that related variables go in an aggregate. So, the array @fh will contain the filehandles from which we are reading simultaneously.

Then, we can read a line from each filehandle and store them in an array using the <> operator in conjunction with map. map takes a transformation rule and a list, and returns another list. Hence:

my @lines = map scalar <$_>, @fh;

takes the filehandles in @fh, and reads a single line from each (note scalar), and puts those lines in @lines. This is a one-to-one transformation of @fh.

As the documentation for <> indicates, <> returns an undefined value if the end-of-file is reached, or there is an error.

Now, one way to check if we successfully read from all files is to check if the number defined lines is the same as the number of filehandles. grep selects elements of a list that satisfy a certain criterion. Hence

@fh == grep defined, my @lines = map <$_>, @fh;

would check if the number of filehandles in @fh is the same as the number of defined elements in @lines. However, the @fh appearing on both sides of this comparison can indeed be confusing, so an alternative way of checking the there are no undefined elements in @lines is:

0 == grep !defined, my @lines = map <$_>, @fh;

If you want to put that condition in a while loop, you have to write:

while (0 == grep !defined, my @lines = map <$_>, @fh) {

whereas if you go with an until, you can simply write:

until (grep !defined, my @lines = map <$_>, @fh) {

This means "until at least one of the readlines returns an undefined value, execute the body of the loop".

Now, note that Perl's eof is different than C's eof. The documentation for Perl's eof notes that:

Practical hint: you almost never need to use eof in Perl, because the input operators typically return undef when they run out of data or encounter an error.

If you check eof every time through the loop, you're doubling your file IO because "this function actually reads a character and then ungetc s it."

I almost always give a self-contained runnable example with my code. Below, I did not want to rely on any specific files existing on your system, so I use the always available DATA and STDIN handles. As opposed to using the eof function, when you use this method, you don't have to worry about where you're reading from: All you care about is whether a readline on any one of the files returned an undefined value. It can also be used with any number of filehandles. Also, you really don't have put the filehandles in an array, but as I said, related variables belong in an aggregate, so if you find yourself typing stuff like

my $var1 = <$fh1>;
my $var2 = <$fh2>;

realize that you should have used an array to store the filehandles.

#!/usr/bin/env perl

use strict; use warnings;

my @fh = (\*DATA, \*STDIN);

until (grep !defined, my @lines = map scalar <$_>, @fh) {
    print for @lines;
}

__DATA__
one
two
three

This example script will stop asking for your input on STDIN when the lines in DATA are exhausted. If you do not have any trailing blank lines in the script, you should have to enter three four lines before the script terminates.

Now, if you want to know which filehandles reached the end, you'd switch to using something like:

#!/usr/bin/env perl

use strict; use warnings;

my @fh = (\*DATA, \*STDIN);

while (1) {
    my @lines = map scalar <$_>, @fh;

    if (my @eof = grep !defined($lines[$_]), 0 .. $#fh) {
        warn "Could not read from filehandle(s) '@eof'";
        last;
    }

    print for @lines;
}

__DATA__
one
two
three

Important

The loops above are designed to stop when any one of the files is exhausted. On the other hand, you might want the loops to run until all of the files are exhausted. In that case, you'd use:

 while (grep defined, my @lines = map scalar <$_>, @fh) {
like image 31
Sinan Ünür Avatar answered Nov 15 '22 10:11

Sinan Ünür


Another easy solution without explicit eof() checking would go like this:

while (defined(my $var1 = <$fh1>) and defined(my $var2 = <$fh2>)) {
    # do stuff
}

This uses the fact that <> returns undef if & only if you're at the end of the file.

like image 20
Ken Williams Avatar answered Nov 15 '22 10:11

Ken Williams