Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to perform a SQL-like Join in Perl?

I have to process some data by combining two different files. Both of them have two columns that would form a primary key that I can use to match them side-by-side. The files in questions are huge (around 5GB with 20 million rows) so I would need an efficient code. How would I do this in Perl?

I give an example:

If File A contains columns

id, name, lastname, dob, school

File B contains columns

address, id, postcode, dob, email

I would need to join these two files by matching id and dob in the two files to have an output file that would have the columns:

 id, name, lastname, dob, school, address, postcode, email
like image 501
sfactor Avatar asked Jan 03 '12 12:01

sfactor


2 Answers

Think I would just create a new mysql/sqlite/whatever DB and insert the rows. Should be ~20 lines of perl.

This, of course, requires easy access to a DB..

Guess you could also sort the files by the interesting fields and then for each line in file1 find and print the matching lines in file2.

like image 191
Øyvind Skaar Avatar answered Nov 15 '22 17:11

Øyvind Skaar


The old fashioned way to do this is to use system utilities to sort both files in key sequence and then match them line by line. Read both files, if the keys match output the data. If they don't match, read the file with the lesser key until they do match. Set the key infinitely high for a file if it hits eof. When both keys are infinitely high, you're done.

like image 2
Bill Ruppert Avatar answered Nov 15 '22 18:11

Bill Ruppert