Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is a "Load DATA" without a file (i.e., in memory) possible for MySQL and Java?

Tags:

java

mysql

csv

load

I'm in the process of optimizing an import of ~10TB of Data into a MySQL database. Currently, I can import 2.9GB (+0.8GB index) in about 14 minutes on a current laptop. The process includes reading a data file (Oracle ".dat" export), parsing the data, writing the data into a CSV file and executing the "LOAD DATA LOCAL" sql command on it.

Is it possible to increase the import speed (without hardware changes)? Is there a way to remove the step of writing a file to the file system and letting MySQL read it again. Is it possible to stream the data in memory directly to MySQL (e.g., via the JDBC driver)?

Many thanks in advance, Joerg.

like image 240
Jörg Rech Avatar asked Sep 02 '10 13:09

Jörg Rech


People also ask

Can MySQL run in memory?

MySQL allocates buffers and caches to improve performance of database operations. The default configuration is designed to permit a MySQL server to start on a virtual machine that has approximately 512MB of RAM.

What is load file in MySQL?

Using the LOAD DATA statement, you can insert the contents of a file (from the server or a host) into a MySQL table. If you use the LOCAL clause, you can upload the local files contents int to a table.

How do I run MySQL in memory?

Create the MEMORY database and recreate the tables you'll be using with this syntax: CREATE TABLE tablename (...) ENGINE = MEMORY; . You can then import your data using LOAD DATA INFILE 'table_filename' INTO TABLE tablename for each table.


2 Answers

It seems that from MySQL Connector/J JDBC driver version 5.1.3 onwards, you can hook up an InputStream reference, using com.mysql.jdbc.Statement.setLocalInfileInputStream() method, internally within your Java code, to 'pipe' your in-memory formatted string/text to the 'LOAD DATA INFILE' call. This means you do not have to write out and re-read a temporary file back from memory. Please refer to:

http://dev.mysql.com/doc/refman/5.1/en/connector-j-reference-implementation-notes.html (bottom of page)

The process is also outlined in this post:

http://jeffrick.com/2010/03/23/bulk-insert-into-a-mysql-database

O'reilly produced a PDF covering MySQL/JDBC performance gems, which refers to this.

There is also mention of it's usage with Hadoop (advanced Java topic).

Hope this all helps.

Cheers

Rich

like image 147
Big Rich Avatar answered Nov 05 '22 15:11

Big Rich


Actual working code for this was hard to come by, so here's some:

@Test
public void bulkInsert() throws SQLException {
    try(com.mysql.jdbc.Connection conn = (com.mysql.jdbc.Connection) dao.getDataSource().getConnection()) {

        conn.setAllowLoadLocalInfile(true);

        try(com.mysql.jdbc.Statement stmt = (com.mysql.jdbc.Statement) conn.createStatement()) {

            stmt.execute("create temporary table BasicDbTest_1 (phone integer)");

            String data = "8675309\n";
            stmt.setLocalInfileInputStream(new ByteArrayInputStream(data.getBytes()));

            stmt.execute("load data local infile '' into table BasicDbTest_1");

            try(ResultSet rs = stmt.executeQuery("select phone from BasicDbTest_1")) {
                Assert.assertTrue(rs.next());
                Assert.assertEquals(rs.getInt(1), 8675309);                 
            }
        }
    }
}
like image 29
Alex R Avatar answered Nov 05 '22 15:11

Alex R