I have two big text Files each having more than 10 Million lines. How can i compare the files and get different lines in the files using C++.
I have tried loading one file into memory and sorted the memory and used the binary tree logic to compare the files. It compared and gave me the result in 20 Sec. But it's consuming more memory. (The text file is around 500 MB).
I want to compare two files without consuming more memory, a Good Performance and to have minimal effects on Hard Disk.
You could try a command line diff tool or DiffUtils for Windows. Textpad also has a comparison tool integrated it the files are text. If you just need to detmine if the files are different (not what the differences are) use a checksum comparison tool that uses MD5 or SHA1.
Step 1: Open both the file with pointer at the starting. Step 2: Fetch data from file as characters one by one. Step 3: Compare the characters. If the characters are different then return the line and position of the error character.
Winmerge is a free and open source file comparison tool designed for Windows. It helps you compare both files and folders, that generate differences in a visual text format which is easy to manage and understand.
you can use a two pass method.
first pass, you read files but only store hash value and line start pos of lines, then u can compare files based on hash value, you only read the lines again for complete compare in the second pass when two lines have same hash value. this will save memory consumption and cpu time, with a bit penalty to read some lines twice.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With