Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the faster way to compare around 700,000 rows from 2 different databases using Perl?

Tags:

database

perl

I'm using Perl to connect to 2 different databases (MySQL and Sybase) using DBI, there's around 700,000 records on each and I need them to be the same (most likely there will be a few different records every week or so), first time doing this would be simply a matter of copying the table, but this needs to be done on a regular basis (at least once a week), and simply dropping the table and copying everything again every time is not a good solution, so I was wondering: What's the faster way to compare around 700,000 rows from 2 different databases using Perl?

Note: The tables have 5 fields (all of them character type including the primary key)

like image 549
DarkAjax Avatar asked Feb 22 '12 14:02

DarkAjax


1 Answers

Load each table, sorted, in to Perl in its entirety, then run Algorithm:Diff on the two lists. In the end you'll get a nice list of rows to delete, and rows to insert. Some rows may be deleted and reinserted (if you have foreign keys hanging of those rows, you'll need to do an update rather than an delete/insert).

700,000 rows is not a lot of data on modern machines, nor a lot of memory.

If you only need existence of rows (i.e. the rows exists or it doesn't rather than actual row changes), you can just do a diff on the keys, then fetch the rows you need from there.

like image 87
Will Hartung Avatar answered Sep 28 '22 14:09

Will Hartung