I'm trying to find a way to efficiently compare a CSV file content with a MySQL database (Over 1 Million rows to compare), I've done something similiar before just placing all the rows into an array but that will work for a small number of rows because of memory overloading.
My question is, is there a recommendable way to doing that? Any libraries or something that could help?
I would appretiate your answers.
So, how do you open large CSV files in Excel? Essentially, there are two options: Split the CSV file into multiple smaller files that do fit within the 1,048,576 row limit; or, Find an Excel add-in that supports CSV files with a higher number of rows.
csv files have a limit of 32,767 characters per cell. Excel has a limit of 1,048,576 rows and 16,384 columns per sheet. CSV files can hold many more rows. You can read more about these limits and others from this Microsoft support article here.
Using pandas. One way to process large files is to read the entries in chunks of reasonable size, which are read into the memory and are processed before reading the next chunk. We can use the chunk size parameter to specify the size of the chunk, which is the number of lines.
Assuming this is a sanity check and you're aiming to have 0 differences, how about dumping out the database as a CSV file of the same format and then using command line tools (diff
or cmp
) to check that they match?
You'd need to make sure your CSV dump is ordered & formatted the same as the original file of course.
Besides @therefromhere's excellent answer, you could also calculate a hash, both in MySQL and in the original file and then compare the two.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With