Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tips on Speeding up JDBC writes?

I am writing a program that does a lot of writes to a Postgres database. In a typical scenario I would be writing say 100,000 rows to a table that's well normalized (three foreign integer keys, the combination of which is the primary key and the index of the table). I am using PreparedStatements and executeBatch(), yet I can only manage to push in say 100k rows in about 70 seconds on my laptop, when the embedded database we're replacing (which has the same foreign key constraints and indices) does it in 10.

I am new at JDBC and I don't expect it to beat a custom embedded DB, but I was hoping it to be only 2-3x slower, not 7x. Anything obvious that I maybe missing? does the order of the writes matter? (i.e. say if it's not the order of the index?). Things to look at to squeeze out a bit more speed?

like image 668
Max Maximus Avatar asked Dec 15 '08 15:12

Max Maximus


People also ask

Why is JDBC so slow?

Establishing a JDBC connection with a DBMS can be very slow. If your application requires database connections that are repeatedly opened and closed, this can become a significant performance issue.

Which is the fastest driver in JDBC?

JDBC Net pure Java driver(Type 4) is the fastest driver because it converts the JDBC calls into vendor specific protocol calls and it directly interacts with the database.


1 Answers

This is an issue that I have had to deal with often on my current project. For our application, insert speed is a critical bottleneck. However, we have discovered for the vast majority of database users, the select speed as their chief bottleneck so you will find that there are more resources dealing with that issue.

So here are a few solutions that we have come up with:

First, all solutions involve using the postgres COPY command. Using COPY to import data into postgres is by far the quickest method available. However, the JDBC driver by default does not currently support COPY accross the network socket. So, if you want to use it you will need to do one of two workarounds:

  1. A JDBC driver patched to support COPY, such as this one.
  2. If the data you are inserting and the database are on the same physical machine, you can write the data out to a file on the filesystem and then use the COPY command to import the data in bulk.

Other options for increasing speed are using JNI to hit the postgres api so you can talk over the unix socket, removing indexes and the pg_bulkload project. However, in the end if you don't implement COPY you will always find performance disappointing.

like image 95
Elijah Avatar answered Oct 13 '22 21:10

Elijah