Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient way to insert multiple rows in JPA

I have a parent/child unidirectional relationship. When I examine the logs I see that there is a separate insert query for each child row, the equivalent of let's say:

insert into childTable(col1, col2) values(val1, val2);
insert into childTable(col1, col2) values(val3, val4);

Wouldn't it be more efficient to insert all the rows in a single query? Something along the lines of:

insert into childTable(col1, col2) values(val1, val2), (val3, val4)

Is there a way to force JPA to generate multi line inserts instead of single line inserts?

Edit: I'm currently using cascading inserts so I insert the parent and the inserts for the children are generated automatically. I would prefer to keep using that method instead of let's say manually creating an enormous SQL query as I think the cascading inserts produce cleaner code.

I already flush the session regularly in order to control the size of the L1 cache so running out of memory is not a problem.

like image 939
ventsyv Avatar asked Nov 30 '15 20:11

ventsyv


People also ask

How do I insert multiple rows of data?

INSERT-SELECT-UNION query to insert multiple records Thus, we can use INSERT-SELECT-UNION query to insert data into multiple rows of the table. The SQL UNION query helps to select all the data that has been enclosed by the SELECT query through the INSERT statement.


1 Answers

It is actually less efficient to insert all rows in a single query.

First, a couple of observations:

  1. The amount of data to pass from client to server is the same either as one or many insert statements, where "amount of data" means the actual values you are storing.
  2. Hibernate supports batching of requests, so the number of round-trips between client and server can be approximately the same either as one or multiple insert statements.

Under the covers, Hibernate is using a PreparedStatement for each query it executes on your behalf, and these are cached and reused. And MySQL caches "compile" SQL statements. Without getting mired in details, the underlying technologies are highly optimized to run a relatively small number of queries many times.

If you do the insert as a single statement, then each time that the number of values to insert is different, the new SQL has to be compiled and cached (possibly pushing another query from the cache) which adds overhead. This overhead is avoided when you just use the same SQL every time.

For many reasons, you must use bind variables in your SQL, and Hibernate will do that for you automatically. If you do some custom queries to test the all-at-once insert method, you definitely should use bind variables as well.

Another consideration is how you generate identifiers. If it is via an identity column in the database, then Hibernate needs to receive back the ID for each column, which generally is only possible when one row was created. For this reason, a sequence-based identifier generator is preferred for efficiency, with client-side caching of sequence values.

I just noticed your edit: My experience has been that Hibernate does "extra" updates when dealing with inserting parent-child data. I managed to get "pure" inserts by changing the mapping to have a "join" table (like you would see for many-to-many relationship) even though I only had a many-to-one relationship. In my case, it was significantly faster to do significantly more inserts into three tables vs. fewer inserts plus updates into two tables. If you are concerned about performance, you definitely should plan on some time to tune the Hibernate configuration.

like image 145
Rob Avatar answered Sep 18 '22 18:09

Rob