Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL Inserting large data sets from file with Java

Tags:

java

mysql

I need to insert about 1.8 million rows from a CSV file into a MySQL database. (only one table)

Currently using Java to parse through the file and insert each line.

As you can imagine this takes quite a few hours to run. (10 roughtly)

The reason I'm not piping it straight in from the file into the db, is the data has to be manipulated before it adds it to the database.

This process needs to be run by an IT manager in there. So I've set it up as a nice batch file for them to run after they drop the new csv file into the right location. So, I need to make this work nicely by droping the file into a certain location and running a batch file. (Windows enviroment)

My question is, what way would be the fastest way to insert this much data; large inserts, from a temp parsed file or one insert at a time? some other idea possibly?

The second question is, how can I optimize my MySQL installation to allow very quick inserts. (there will be a point where a large select of all the data is required as well)

Note: the table will be eventually droped and the whole process run again at a later date.

Some clarification: currently using ...opencsv.CSVReader to parse the file then doing an insert on each line. I'm concating some columns though and ignoring others.

More clarification: Local DB MyISAM table

like image 280
Derek Organ Avatar asked Jun 30 '09 21:06

Derek Organ


3 Answers

Tips for fast insertion:

  • Use the LOAD DATA INFILE syntax to let MySQL parse it and insert it, even if you have to mangle it and feed it after the manipulation.
  • Use this insert syntax:

    insert into table (col1, col2) values (val1, val2), (val3, val4), ...

  • Remove all keys/indexes prior to insertion.

  • Do it in the fastest machine you've got (IO-wise mainly, but RAM and CPU also matter). Both the DB server, but also the inserting client, remember you'll be paying twice the IO price (once reading, the second inserting)
like image 120
Vinko Vrsalovic Avatar answered Oct 27 '22 04:10

Vinko Vrsalovic


I'd probably pick a large number, like 10k rows, and load that many rows from the CSV, massage the data, and do a batch update, then repeat until you've gone through the entire csv. Depending on the massaging/amount of data 1.8 mil rows shouldn't take 10 hours, more like 1-2 hours depending on your hardware.

edit: whoops, left out a fairly important part, your con has to have autocommit set to false, the code I copied this from was doing it as part of the GetConnection() method.

    Connection con = GetConnection();
con.setAutoCommit(false);
            try{
                PreparedStatement ps = con.prepareStatement("INSERT INTO table(col1, col2) VALUES(?, ?)");
                try{
                    for(Data d : massagedData){
                        ps.setString(1, d.whatever());
                                        ps.setString(2, d.whatever2());
                                            ps.addBatch();
                    }
                    ps.executeBatch();
                }finally{
                    ps.close();
                }
            }finally{
                con.close();
            }
like image 34
Hardwareguy Avatar answered Oct 27 '22 02:10

Hardwareguy


Are you absolutely CERTAIN you have disabled auto commits in the JDBC driver?

This is the typical performance killer for JDBC clients.

like image 33
Thorbjørn Ravn Andersen Avatar answered Oct 27 '22 04:10

Thorbjørn Ravn Andersen