Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Checking whether a large CSV file (1m rows) has the same data as a MySQL table

Tags:

php

mysql

csv

I'm trying to find a way to efficiently compare a CSV file content with a MySQL database (Over 1 Million rows to compare), I've done something similiar before just placing all the rows into an array but that will work for a small number of rows because of memory overloading.

My question is, is there a recommendable way to doing that? Any libraries or something that could help?

I would appretiate your answers.

like image 718
Kelvin De Moya Avatar asked Apr 16 '12 01:04

Kelvin De Moya


People also ask

How do I open a CSV file with more than 1 million rows?

So, how do you open large CSV files in Excel? Essentially, there are two options: Split the CSV file into multiple smaller files that do fit within the 1,048,576 row limit; or, Find an Excel add-in that supports CSV files with a higher number of rows.

How many rows can CSV handle?

csv files have a limit of 32,767 characters per cell. Excel has a limit of 1,048,576 rows and 16,384 columns per sheet. CSV files can hold many more rows. You can read more about these limits and others from this Microsoft support article here.

How does pandas work with large CSV files?

Using pandas. One way to process large files is to read the entries in chunks of reasonable size, which are read into the memory and are processed before reading the next chunk. We can use the chunk size parameter to specify the size of the chunk, which is the number of lines.


2 Answers

Assuming this is a sanity check and you're aiming to have 0 differences, how about dumping out the database as a CSV file of the same format and then using command line tools (diff or cmp) to check that they match?

You'd need to make sure your CSV dump is ordered & formatted the same as the original file of course.

like image 178
John Carter Avatar answered Sep 17 '22 22:09

John Carter


Besides @therefromhere's excellent answer, you could also calculate a hash, both in MySQL and in the original file and then compare the two.

like image 32
ypercubeᵀᴹ Avatar answered Sep 21 '22 22:09

ypercubeᵀᴹ