Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use Perl to determine whether the contents of two files are identical?

This question comes from a need to ensure that changes I've made to code doesn't affect the values it outputs to text file. Ideally, I'd roll a sub to take in two filenames and return 1or return 0 depending on whether the contents are identical or not, whitespaces and all.

Given that text-processing is Perl's forté, it should be quite easy to compare two files and determine whether they are identical or not (code below untested).

use strict;
use warnings;

sub files_match {

    my ( $fileA, $fileB ) = @_;
    open my $file1, '<', $fileA;
    open my $file2, '<', $fileB;

    while (my $lineA = <$file1>) {

        next if $lineA eq <$file2>;
        return 0 and last;
    }

    return 1;
}

The only way I can think of (sans CPAN modules) is to open the two files in question, and read them in line-by-line until a difference is found. If no difference is found, the files must be identical.

But this approach is limited and clumsy. What if the total lines differ in the two files? Should I open and close to determine line count, then re-open to scan the texts? Yuck.

I don't see anything in perlfaq5 relating to this. I want to stay away from modules unless they come with the core Perl 5.6.1 distribution.

like image 899
Zaid Avatar asked May 17 '10 09:05

Zaid


People also ask

How to compare 2 files in Perl?

In Perl, we can easily compare the content of two files by using the File::Compare module. This module provides a function called compare, which helps in comparing the content of two files specified to it as arguments.

Which techniques can be used to determine if two files are identical or not?

You can use fc or comp for comparing files. Both will tell you whether or not they are identical.


2 Answers

It's in the core.

use File::Compare;

if (compare("file1", "file2") == 0) {
  print "They're equal\n";
}
like image 187
Jonas Elfström Avatar answered Oct 21 '22 12:10

Jonas Elfström


There are a couple of O(1) checks you can do first to see if the files are different.

If the files have different sizes, then they are obviously different. The stat function will return the sizes of the files. It will also return another piece of data that will be useful: the inode number. If the two files are really the same file (because the same filename was passed in for both files or because both names are hardlinks for the same file), the inode number will be the same. A file is obviously the same as itself. Baring those two checks there is no better way to compare two local files for equivalence other than to directly compare them against each other. Of course, there is no need to do it line by line, you can read in larger blocks if you so desire.

#!/usr/bin/perl

use strict;
use warnings;

use File::Compare ();

sub compare {
    my ($first, $second)             = @_;
    my ($first_inode, $first_size)   = (stat $first)[1, 7];
    my ($second_inode, $second_size) = (stat $second)[1, 7];

    #same file, so must be the same;
    return 0 if $first_inode == $second_inode;

    #different sizes, so must be different
    return 1 unless $first_size == $second_size;

    return File::Compare::compare @_;
}

print compare(@ARGV) ? "not the " : "", "same\n";
like image 34
Chas. Owens Avatar answered Oct 21 '22 12:10

Chas. Owens