I have a test case where I need to persist 100'000 entity instances into the database. The code I'm currently using does this, but it takes up to 40 seconds until all the data is persisted in the database. The data is read from a JSON file which is about 15 MB in size. Now I had already implemented a batch insert method in a custom repository before for another project. However, in that case I had a lot of top level entities to persist, with only a few nested entities. In my current case I have 5 <code>Job</code> entities that contain a List of about ~30 <code>JobDetail</code> entities. One <code>JobDetail</code> contains between 850 and 1100 <code>JobEnvelope</code> entities. When writing to the database I commit the List of <code>Job</code> entities with the default <code>save(Iterable<Job> jobs)</code> interface method. All nested entities have the CascadeType <code>PERSIST</code>. Each entity has it's own table. The usual way to enable batch inserts would be to implement a custom method like <code>saveBatch</code> that flushes every once in a while. But my problem in this case are the <code>JobEnvelope</code> entities. I don't persist them with a <code>JobEnvelope</code> repository, instead I let the repository of the <code>Job</code>entity handle it. I'm using MariaDB as database server. So my question boils down to the following: How can I make the <code>JobRepository</code> insert it's nested entities in batches? These are my 3 entites in question: <h3>Job</h3> <pre class="prettyprint"><code>@Entity public class Job { @Id @GeneratedValue private int jobId; @OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST, mappedBy = "job") @JsonManagedReference private Collection<JobDetail> jobDetails; } </code></pre> <h3>JobDetail</h3> <pre class="prettyprint"><code>@Entity public class JobDetail { @Id @GeneratedValue private int jobDetailId; @ManyToOne(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST) @JoinColumn(name = "jobId") @JsonBackReference private Job job; @OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST, mappedBy = "jobDetail") @JsonManagedReference private List<JobEnvelope> jobEnvelopes; } </code></pre> <h3>JobEnvelope</h3> <pre class="prettyprint"><code>@Entity public class JobEnvelope { @Id @GeneratedValue private int jobEnvelopeId; @ManyToOne(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST) @JoinColumn(name = "jobDetailId") private JobDetail jobDetail; private double weight; } </code></pre>

Make sure to configure Hibernate batch-related properties properly: <pre class="prettyprint"><code><property name="hibernate.jdbc.batch_size">100</property> <property name="hibernate.order_inserts">true</property> <property name="hibernate.order_updates">true</property> </code></pre> The point is that successive statements can be batched if they manipulate the same table. If there comes the statement doing insert to another table, the previous batch construction must be interrupted and executed before that statement. With the <code>hibernate.order_inserts</code> property you are giving permission to Hibernate to reorder inserts before constructing batch statements (<code>hibernate.order_updates</code> has the same effect for update statements). <code>jdbc.batch_size</code> is the maximum batch size that Hibernate will use. Try and analyze different values and pick one that shows best performance in your use cases. Note that batching of insert statements is disabled if <code>IDENTITY</code> id generator is used. Specific to MySQL, you have to specify <code>rewriteBatchedStatements=true</code> as part of the connection URL. To make sure that batching is working as expected, add <code>profileSQL=true</code> to inspect the SQL the driver sends to the database. More details here. If your entities are versioned (for optimistic locking purposes), then in order to utilize batch updates (doesn't impact inserts) you will have to turn on also: <pre class="prettyprint"><code><property name="hibernate.jdbc.batch_versioned_data">true</property> </code></pre> With this property you tell Hibernate that the JDBC driver is capable to return the correct count of affected rows when executing batch update (needed to perform the version check). You have to check whether this works properly for your database/jdbc driver. For example, it does not work in Oracle 11 and older Oracle versions. You may also want to flush and clear the persistence context after each batch to release memory, otherwise all of the managed objects remain in the persistence context until it is closed. Also, you may find this blog useful as it nicely explains the details of Hibernate batching mechanism.

Spring Data JPA: Batch insert for nested entities

Tags:

java

hibernate

spring-data-jpa

I have a test case where I need to persist 100'000 entity instances into the database. The code I'm currently using does this, but it takes up to 40 seconds until all the data is persisted in the database. The data is read from a JSON file which is about 15 MB in size.

Now I had already implemented a batch insert method in a custom repository before for another project. However, in that case I had a lot of top level entities to persist, with only a few nested entities.

In my current case I have 5 Job entities that contain a List of about ~30 JobDetail entities. One JobDetail contains between 850 and 1100 JobEnvelope entities.

When writing to the database I commit the List of Job entities with the default save(Iterable<Job> jobs) interface method. All nested entities have the CascadeType PERSIST. Each entity has it's own table.

The usual way to enable batch inserts would be to implement a custom method like saveBatch that flushes every once in a while. But my problem in this case are the JobEnvelope entities. I don't persist them with a JobEnvelope repository, instead I let the repository of the Jobentity handle it. I'm using MariaDB as database server.

So my question boils down to the following: How can I make the JobRepository insert it's nested entities in batches?

These are my 3 entites in question:

Job

@Entity
public class Job {
  @Id
  @GeneratedValue
  private int jobId;

  @OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST, mappedBy = "job")
  @JsonManagedReference
  private Collection<JobDetail> jobDetails;
}

JobDetail

@Entity
public class JobDetail {
  @Id
  @GeneratedValue
  private int jobDetailId;

  @ManyToOne(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST)
  @JoinColumn(name = "jobId")
  @JsonBackReference
  private Job job;

  @OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST, mappedBy = "jobDetail")
  @JsonManagedReference
  private List<JobEnvelope> jobEnvelopes;
}

JobEnvelope

@Entity
public class JobEnvelope {
  @Id
  @GeneratedValue
  private int jobEnvelopeId;

  @ManyToOne(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST)
  @JoinColumn(name = "jobDetailId")
  private JobDetail jobDetail;

  private double weight;
}

709

asked Mar 04 '16 08:03

Ahatius

1 Answers

Make sure to configure Hibernate batch-related properties properly:

<property name="hibernate.jdbc.batch_size">100</property>
<property name="hibernate.order_inserts">true</property>
<property name="hibernate.order_updates">true</property>

The point is that successive statements can be batched if they manipulate the same table. If there comes the statement doing insert to another table, the previous batch construction must be interrupted and executed before that statement. With the hibernate.order_inserts property you are giving permission to Hibernate to reorder inserts before constructing batch statements (hibernate.order_updates has the same effect for update statements).

jdbc.batch_size is the maximum batch size that Hibernate will use. Try and analyze different values and pick one that shows best performance in your use cases.

Note that batching of insert statements is disabled if IDENTITY id generator is used.

Specific to MySQL, you have to specify rewriteBatchedStatements=true as part of the connection URL. To make sure that batching is working as expected, add profileSQL=true to inspect the SQL the driver sends to the database. More details here.

If your entities are versioned (for optimistic locking purposes), then in order to utilize batch updates (doesn't impact inserts) you will have to turn on also:

<property name="hibernate.jdbc.batch_versioned_data">true</property>

With this property you tell Hibernate that the JDBC driver is capable to return the correct count of affected rows when executing batch update (needed to perform the version check). You have to check whether this works properly for your database/jdbc driver. For example, it does not work in Oracle 11 and older Oracle versions.

You may also want to flush and clear the persistence context after each batch to release memory, otherwise all of the managed objects remain in the persistence context until it is closed.

Also, you may find this blog useful as it nicely explains the details of Hibernate batching mechanism.

166

answered Sep 24 '22 11:09

Dragan Bozanovic

Related questions
                            
                                How to get instance of javax.ws.rs.core.UriInfo
                            
                                How to add raw XML text to SOAPBody element
                            
                                How can a variable be null in this piece of code?
                            
                                Is using own int capacity faster than using .length field of an array?
                            
                                MongoDB Java driver 3.x: How to pass allowDiskUse=true to aggregate() method?
                            
                                Copy Maven repository to another computer
                            
                                BadPaddingException : Decryption error
                            
                                Java with Groovy handling of closures throwing Exceptions
                            
                                ISO 8601 with milliseconds and Retrofit
                            
                                Why is it not possible use primitive types with polymorphic return types?
                            
                                Integration test per layer is a good practice?
                            
                                IntelliJ "FileNotFoundException", File Exists
                            
                                How to send a file from JavaScript to a Java WebService
                            
                                MapStruct: Mapping 2 objects to a 3rd one
                            
                                Discriminating users without authentication in Spring
                            
                                Openssl key generation on OS X failing
                            
                                What is the best way to test Controllers and Services with JUnit?
                            
                                Java Wrapper for Mailchimp API v3.0
                            
                                Set java.library.path for testing
                            
                                Jackson Deserialization of Embedded Java Object

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With