Can this checksum algorithm be improved?

Question

We have a very old, unsupported program which copies files across SMB shares. It has a checksum algorithm to determine if the file contents have changed before copying. The algorithm seems easily fooled -- we've just found an example where two files, identical except a single '1' changing to a '2', return the same checksum. Here's the algorithm:

unsigned long GetFileCheckSum(CString PathFilename)
{
        FILE* File;
        unsigned long CheckSum = 0;
        unsigned long Data = 0;
        unsigned long Count = 0;

        if ((File = fopen(PathFilename, "rb")) != NULL)
        {
                while (fread(&Data, 1, sizeof(unsigned long), File) != FALSE)
                {
                        CheckSum ^= Data + ++Count;
                        Data = 0;
                }
                fclose(File);
        }
        return CheckSum;
}

I'm not much of a programmer (I am a sysadmin) but I know an XOR-based checksum is going to be pretty crude. What're the chances of this algorithm returning the same checksum for two files of the same size with different contents? (I'm not expecting an exact answer, "remote" or "quite likely" is fine.)

How could it be improved without a huge performance hit?

Lastly, what's going on with the fread()? I had a quick scan of the documentation but I couldn't figure it out. Is Data being set to each byte of the file in turn? Edit: okay, so it's reading the file into unsigned long (let's assume a 32-bit OS here) chunks. What does each chunk contain? If the contents of the file are abcd, what is the value of Data on the first pass? Is it (in Perl):

(ord('a') << 24) & (ord('b') << 16) & (ord('c') << 8) & ord('d')

Robert Harvey · Accepted Answer

MD5 is commonly used to verify the integrity of transfer files. Source code is readily available in c++. It is widely considered to be a fast and accurate algorithm.

See also Robust and fast checksum algorithm?

Can this checksum algorithm be improved?

Tags:

c++

algorithm

checksum

markdrayton

1 Answers

Robert Harvey

Recent Activity

Donate For Us

Can this checksum algorithm be improved?

Tags:

c++

algorithm

checksum

markdrayton

1 Answers

Robert Harvey

Related questions

Recent Activity

Donate For Us