Batch Processing Using Statement. With JDBC, the simplest way to execute queries on a database is via the Statement object. First, using addBatch() we can add all SQL queries to a batch and then execute those SQL queries using executeBatch().
The batch size can be anything but needs to be decided carefully. JDBC specification supports up to 100 but individual databases e.g. Oracle, MySQL, Sybase, or SQL Server has their own limit on maximum batch size,, normal jdbc batch size ranges from 50 to 100.
A JDBC batch update is a batch of updates grouped together, and sent to the database in one batch, rather than sending the updates one by one. Sending a batch of updates to the database in one go, is faster than sending them one by one, waiting for each one to finish.
Batch Processing allows you to group related SQL statements into a batch and submit them with one call to the database. When you send several SQL statements to the database at once, you reduce the amount of communication overhead, thereby improving performance.
I had a similar performance issue with mysql and solved it by setting the useServerPrepStmts and the rewriteBatchedStatements properties in the connection url.
Connection c = DriverManager.getConnection("jdbc:mysql://host:3306/db?useServerPrepStmts=false&rewriteBatchedStatements=true", "username", "password");
I'd like to expand on Bertil's answer, as I've been experimenting with the connection URL parameters.
rewriteBatchedStatements=true
is the important parameter. useServerPrepStmts
is already false by default, and even changing it to true doesn't make much difference in terms of batch insert performance.
Now I think is the time to write how rewriteBatchedStatements=true
improves the performance so dramatically. It does so by rewriting of prepared statements for INSERT into multi-value inserts when executeBatch()
(Source). That means that instead of sending the following n
INSERT statements to the mysql server each time executeBatch()
is called :
INSERT INTO X VALUES (A1,B1,C1)
INSERT INTO X VALUES (A2,B2,C2)
...
INSERT INTO X VALUES (An,Bn,Cn)
It would send a single INSERT statement :
INSERT INTO X VALUES (A1,B1,C1),(A2,B2,C2),...,(An,Bn,Cn)
You can observe it by toggling on the mysql logging (by SET global general_log = 1
) which would log into a file each statement sent to the mysql server.
You can insert multiple rows with one insert statement, doing a few thousands at a time can greatly speed things up, that is, instead of doing e.g. 3 inserts of the form INSERT INTO tbl_name (a,b,c) VALUES(1,2,3);
, you do INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3);
(It might be JDBC .addBatch() does similar optimization now - though the mysql addBatch used to be entierly un-optimized and just issuing individual queries anyhow - I don't know if that's still the case with recent drivers)
If you really need speed, load your data from a comma separated file with LOAD DATA INFILE , we get around 7-8 times speedup doing that vs doing tens of millions of inserts.
If:
Then ALTER TABLE tbl_name DISABLE KEYS
can greatly improve the speed of your inserts. When you're done, run ALTER TABLE tbl_name ENABLE KEYS
to start building the indexes, which can take a while, but not nearly as long as doing it for every insert.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With