Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loading flatfiles into a normalized MySQL database

What is the fastest way to load data from flatfiles into a MySQL database, and then create the relations between the tables via foreign keys?

For example... I have a flat file in the format:

[INDIVIDUAL]   [POP]  [MARKER]  [GENOTYPE]

"INDIVIDUAL1", "CEU", "rs55555","AA"  
"INDIVIDUAL1", "CEU", "rs535454","GA"  
"INDIVIDUAL1", "CEU", "rs555566","AT"  
"INDIVIDUAL1", "CEU", "rs12345","TT"  
...  
"INDIVIDUAL2", "JPT", "rs55555","AT"  

Which I need to load into four tables:

IND (id,fk_pop,name)  
POP (id,population)  
MARKER (id,rsid)  
GENOTYPE (id,fk_ind,fk_rsid,call)  

Specifically, how does one populate the foreign keys in a way that scales? The figures are in the range of 1000+ individuals, each with 1 million+ genotypes.

like image 724
pufferfish Avatar asked Dec 14 '22 06:12

pufferfish


1 Answers

I would take a multiple step approach to do this.

  1. Load the data into a temporary table, that is matching the file format that you have
  2. Write queries to do the other inserts, starting the the general tables, then doing joins to get the FK values.
like image 78
Mitchel Sellers Avatar answered Dec 17 '22 23:12

Mitchel Sellers