Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare two files

Tags:

c++

I'm trying to write a function which compares the content of two files.

I want it to return 1 if files are the same, and 0 if different.

ch1 and ch2 works as a buffer, and I used fgets to get the content of my files.

I think there is something wrong with the eof pointer, but I'm not sure. FILE variables are given within the command line.

P.S. It works with small files with size under 64KB, but doesn't work with larger files (700MB movies for example, or 5MB of .mp3 files).

Any ideas, how to work it out?

int compareFile(FILE* file_compared, FILE* file_checked)
{
    bool diff = 0;
    int N = 65536;
    char* b1 = (char*) calloc (1, N+1);
    char* b2 = (char*) calloc (1, N+1);
    size_t s1, s2;

    do {
        s1 = fread(b1, 1, N, file_compared);
        s2 = fread(b2, 1, N, file_checked);

        if (s1 != s2 || memcmp(b1, b2, s1)) {
            diff = 1;
            break;
        }
      } while (!feof(file_compared) || !feof(file_checked));

    free(b1);
    free(b2);

    if (diff) return 0;
    else return 1;
}

EDIT: I've improved this function with the inclusion of your answers. But it's only comparing first buffer only -> but with an exception -> I figured out that it stops reading the file until it reaches 1A character (attached file). How can we make it work?

EDIT2: Task solved (working code attached). Thanks to everyone for the help!

like image 918
Chris Avatar asked May 28 '11 18:05

Chris


2 Answers

If you can give up a little speed, here is a C++ way that requires little code:

#include <fstream>
#include <iterator>
#include <string>
#include <algorithm>

bool compareFiles(const std::string& p1, const std::string& p2) {
  std::ifstream f1(p1, std::ifstream::binary|std::ifstream::ate);
  std::ifstream f2(p2, std::ifstream::binary|std::ifstream::ate);

  if (f1.fail() || f2.fail()) {
    return false; //file problem
  }

  if (f1.tellg() != f2.tellg()) {
    return false; //size mismatch
  }

  //seek back to beginning and use std::equal to compare contents
  f1.seekg(0, std::ifstream::beg);
  f2.seekg(0, std::ifstream::beg);
  return std::equal(std::istreambuf_iterator<char>(f1.rdbuf()),
                    std::istreambuf_iterator<char>(),
                    std::istreambuf_iterator<char>(f2.rdbuf()));
}

By using istreambuf_iterators you push the buffer size choice, actual reading, and tracking of eof into the standard library implementation. std::equal returns when it hits the first mismatch, so this should not run any longer than it needs to.

This is slower than Linux's cmp, but it's very easy to read.

like image 153
mtrw Avatar answered Sep 26 '22 19:09

mtrw


Here's a C++ solution. It seems appropriate since your question is tagged as C++. The program uses ifstream's rather than FILE*'s. It also shows you how to seek on a file stream to determine a file's size. Finally, it reads blocks of 4096 at a time, so large files will be processed as expected.

// g++ -Wall -Wextra equifile.cpp -o equifile.exe

#include <iostream>
using std::cout;
using std::cerr;
using std::endl;

#include <fstream>
using std::ios;
using std::ifstream;

#include <exception>
using std::exception;

#include <cstring>
#include <cstdlib>
using std::exit;
using std::memcmp;

bool equalFiles(ifstream& in1, ifstream& in2);

int main(int argc, char* argv[])
{
    if(argc != 3)
    {
        cerr << "Usage: equifile.exe <file1> <file2>" << endl;
        exit(-1);
    }

    try {
        ifstream in1(argv[1], ios::binary);
        ifstream in2(argv[2], ios::binary);

        if(equalFiles(in1, in2)) {
            cout << "Files are equal" << endl;
            exit(0);
        }
        else
        {
            cout << "Files are not equal" << endl;
            exit(1);
        }

    } catch (const exception& ex) {
        cerr << ex.what() << endl;
        exit(-2);
    }

    return -3;
}

bool equalFiles(ifstream& in1, ifstream& in2)
{
    ifstream::pos_type size1, size2;

    size1 = in1.seekg(0, ifstream::end).tellg();
    in1.seekg(0, ifstream::beg);

    size2 = in2.seekg(0, ifstream::end).tellg();
    in2.seekg(0, ifstream::beg);

    if(size1 != size2)
        return false;

    static const size_t BLOCKSIZE = 4096;
    size_t remaining = size1;

    while(remaining)
    {
        char buffer1[BLOCKSIZE], buffer2[BLOCKSIZE];
        size_t size = std::min(BLOCKSIZE, remaining);

        in1.read(buffer1, size);
        in2.read(buffer2, size);

        if(0 != memcmp(buffer1, buffer2, size))
            return false;

        remaining -= size;
    }

    return true;
}
like image 33
jww Avatar answered Sep 22 '22 19:09

jww