Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing Huge Files using C++

Tags:

c++

file

diff

I have two big text Files each having more than 10 Million lines. How can i compare the files and get different lines in the files using C++.

I have tried loading one file into memory and sorted the memory and used the binary tree logic to compare the files. It compared and gave me the result in 20 Sec. But it's consuming more memory. (The text file is around 500 MB).

I want to compare two files without consuming more memory, a Good Performance and to have minimal effects on Hard Disk.

like image 478
Manikandaraj Srinivasan Avatar asked Aug 06 '12 17:08

Manikandaraj Srinivasan


People also ask

How can I compare two large files?

You could try a command line diff tool or DiffUtils for Windows. Textpad also has a comparison tool integrated it the files are text. If you just need to detmine if the files are different (not what the differences are) use a checksum comparison tool that uses MD5 or SHA1.

How can I compare two C files?

Step 1: Open both the file with pointer at the starting. Step 2: Fetch data from file as characters one by one. Step 3: Compare the characters. If the characters are different then return the line and position of the error character.

How do I compare large files in Windows?

Winmerge is a free and open source file comparison tool designed for Windows. It helps you compare both files and folders, that generate differences in a visual text format which is easy to manage and understand.


1 Answers

you can use a two pass method.

first pass, you read files but only store hash value and line start pos of lines, then u can compare files based on hash value, you only read the lines again for complete compare in the second pass when two lines have same hash value. this will save memory consumption and cpu time, with a bit penalty to read some lines twice.

like image 64
FrostNovaZzz Avatar answered Sep 29 '22 03:09

FrostNovaZzz