Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I walk through two files simultaneously in Perl?

Tags:

file-io

perl

I have two text files that contain columnar data of the variety position-value, sorted by position.

Here is an example of the first file (file A):

100   1
101   1
102   0
103   2
104   1
...

Here is an example of the second file (B):

20    0
21    0
...
100   2
101   1
192   3
193   1
...

Instead of reading one of the two files into a hash table, which is prohibitive due to memory constraints, what I would like to do is walk through two files simultaneously, in a stepwise fashion.

What this means is that I would like to stream through lines of either A or B and compare position values.

If the two positions are equal, then I perform a calculation on the values associated with that position.

Otherwise, if the positions are not equal, I move through lines of file A or file B until the positions are equal (when I again perform my calculation) or I reach EOF of both files.

Is there a way to do this in Perl?

like image 829
Alex Reynolds Avatar asked Mar 23 '10 10:03

Alex Reynolds


1 Answers

Looks like a problem one would likely stumble upon, for example database table data with keys and values. Here's an implementation of the pseudocode provided by rjp.

#!/usr/bin/perl

use strict;
use warnings;

sub read_file_line {
  my $fh = shift;

  if ($fh and my $line = <$fh>) {
    chomp $line;
    return [ split(/\t/, $line) ];
  }
  return;
}

sub compute {
   # do something with the 2 values
}

open(my $f1, "file1");
open(my $f2, "file2");

my $pair1 = read_file_line($f1);
my $pair2 = read_file_line($f2);

while ($pair1 and $pair2) {
  if ($pair1->[0] < $pair2->[0]) {
    $pair1 = read_file_line($f1);
  } elsif ($pair2->[0] < $pair1->[0]) {
    $pair2 = read_file_line($f2);
  } else {
    compute($pair1->[1], $pair2->[1]);
    $pair1 = read_file_line($f1);
    $pair2 = read_file_line($f2);
  }
}

close($f1);
close($f2);

Hope this helps!

like image 187
Terence Avatar answered Nov 01 '22 06:11

Terence