I have to compare two csv files with a size of 2-3 GB each, contained in Windows platform.
I've tried to put the first one in a HashMap to compare it with the second one, but the result (as expected) is a very high memory cosumption.
The target is to get the differences in another file.
The lines may appear in diffent order, and maybe missed also.
Any suggetions?
So, how do you open large CSV files in Excel? Essentially, there are two options: Split the CSV file into multiple smaller files that do fit within the 1,048,576 row limit; or, Find an Excel add-in that supports CSV files with a higher number of rows.
csv files have a limit of 32,767 characters per cell. Excel has a limit of 1,048,576 rows and 16,384 columns per sheet. CSV files can hold many more rows.
Beyond Compare Table Compare can look at a pair of tabular data files. It accepts . xlsx Excel, and . csv, but also things like PDFs and Word Docs if they have tabular data.
Assuming you wish to do this in Java, via programming, the answers are different.
Are both of the files ordered? If so, then you don't need to read in whole files, you simply start at the beginning of both files, and
If you don't have ordered files, then perhaps you could order the files prior to the diff. Again, since you need a low memory solution, don't read the entire file in to sort it. Chop the file up into manageable chunks, and then sort each chunk. Then use insertion sort to combine the chunks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With