What's the fastest way in Perl to get all lines of file1 that do not appear in file2?

Question

I have two (very large) text files. What is the fastest way - in terms of run time - to create a third file containing all lines of file1 that do not appear in file2?

So if file1 contains:

Sally  
Joe  
Tom  
Suzie

And file2 contains:

Sally  
Suzie  
Harry  
Tom

Then the output file should contain:

Joe

moinudin · Accepted Answer

Create a hashmap containing each line from file 2. Then for each line in file 1, if it is not in the hashmap then output it. This will be O(N), which is the best efficiency class you can achieve given that you have to read the input.

Perl implementation:

#!/usr/bin/env perl
use warnings;
use strict;
use Carp ();

my $file1 = 'file1.txt';
my $file2 = 'file2.txt';

my %map;
{
    open my $in, '<',$file2 or Carp::croak("Cant open $file2");
    while (<$in>) {
      $map{$_} = 1;
    }
    close($in) or Carp::carp("error closing $file2");
}
{
   open my $in,'<', $file1 or Carp::croak("Cant open $file1");
   while (<$in>) {
    if (!$map{$_}) {
      print $_;
    }
   }
   close $in or Carp::carp("error closing $file1");
}

If file 2 is so large that the hashmap doesn't fit in memory, then we have a different problem at hand. A possible solution is then to use the above solution on chunks of file 2 (small enough to fit into memory), outputing the results to temporary files. Provided there are sufficient matches between file 1 and file 2, then total output should be of reasonable size. To calculate the final results, we perform an intersection of the lines in temporary files, i.e. for a line to be in the final results it must occur in every temporary file.

What's the fastest way in Perl to get all lines of file1 that do not appear in file2?

Tags:

algorithm

perl

Rini

1 Answers

moinudin

Recent Activity

Donate For Us

What's the fastest way in Perl to get all lines of file1 that do not appear in file2?

Tags:

algorithm

perl

Rini

1 Answers

moinudin

Related questions

Recent Activity

Donate For Us