Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

No matter what, I can't batch MySQL INSERT statements in Hibernate

I'm currently facing the well-known and common Hibernate insert batch problem.

I need to save batches 5 millions of rows long. I'm first trying with a much lighter payload. Since I have to insert entities of only 2 types (first all records of type A, then all records of type B, all pointing to common type C ManyToOne parent), I would like to take the most advantage from JDBC batch insert.

I have already read lots of documentation, but none that I have tried worked.

  • I know that in order to use batch inserts I must not use an entity generator. So I removed the AUTO_INCREMENT ID and I'm setting the ID with a trick: SELECT MAX(ID) FROM ENTITIES and increment every time.
  • I know that I must flush the session regularly. I'll post code ahead, but anyway I perform a transaction every 500 elements.
  • I know that I have to set hibernate.jdbc.batch_size consistent with my application's bulk size, so I set it in the LocalSessionFactoryBean (Spring ORM integration)
  • I know I have to enable rewriting batched statements in connection URL.

Here are my entities

Common parent entity. This gets inserted first in a single transaction. I don't care about auto increment column here. Only one record per batch job

@Entity @Table(...) @SequenceGenerator(...) public class Deal {      @Id     @Column(             name = "DEAL_ID",             nullable = false)     @GeneratedValue(             strategy = GenerationType.AUTO)     protected Long id;      ................ } 

One of the children (let's say 2.5M records per batch)

@Entity @Table(         name = "TA_LOANS") public class Loan {      @Id     @Column(             name = "LOAN_ID",             nullable = false)     protected Long id;      @ManyToOne(             optional = false,             targetEntity = Deal.class,             fetch = FetchType.LAZY)     @JoinColumn(             name = "DEAL_ID",             nullable = false)     protected Deal deal;       ............. } 

The other children type. Let's say the other 2.5M records

@Entity @Table(         name = "TA_BONDS") public class Bond {      @Id     @Column(             name = "BOND_ID")      @ManyToOne(             fetch = FetchType.LAZY,             optional = false,             targetEntity = Deal.class)     @JoinColumn(             name = "DEAL_ID",             nullable = false,             updatable = false)     protected Deal deal;  } 

Simplified code that inserts records

    long loanIdCounter = loanDao.getMaxId(), bondIdCounter = bondDao.getMaxId(); //Perform SELECT MAX(ID)      Deal deal = null;      List<Bond> bondList = new ArrayList<Bond>(COMMIT_BATCH_SIZE); //500 constant value     List<Loan> loanList = new ArrayList<Loan>(COMMIT_BATCH_SIZE);      for (String msg: inputStreamReader)     {         log.debug(msg.toString());          if (this is a deal)         {             Deal deal = parseDeal(msg.getMessage());              deal = dealManager.persist(holder.deal); //Called in a separate transaction using Spring annotation @Transaction(REQUIRES_NEW)          }         else if (this is a loan)         {              Loan loan = parseLoan(msg.getMessage());             loan.setId(++loanIdCounter);             loan.setDeal(deal);              loanList.add(loan);              if (loanList.size() == COMMIT_BATCH_SIZE)             {                 loanManager.bulkInsert(loanList); //Perform a bulk insert in a single transaction, not annotated but handled manually this time                 loanList.clear();             }         }         else if (this is a bond)         {             Bond bond = parseBond(msg.getMessage());             bond.setId(++bondIdCounter);             bond.setDeal(deal);              bondList.add(bond);                if (bondList.size() == COMMIT_BATCH_SIZE) //As above             {                 bondManager.bulkInsert(bondList);                 bondList.clear();              }         }     }      if (!bondList.isEmpty())         bondManager.bulkInsert(bondList);     if (!loanList.isEmpty())         loanManager.bulkInsert(loanList);     //Flush remaining items, not important 

Implementation of bulkInsert:

@Override public void bulkInsert(Collection<Bond> bonds) {     // StatelessSession session = sessionFactory.openStatelessSession();     Session session = sessionFactory.openSession();     try     {         Transaction t = session.beginTransaction();         try         {             for (Bond bond : bonds)                 // session.persist(bond);                 // session.insert(bond);                 session.save(bond);         }         catch (RuntimeException ex)         {             t.rollback();         }         finally         {             t.commit();         }     }     finally     {         session.close();     }  } 

As you can see from comments, I have tried several combinations of stateful/stateless session. None worked.

My dataSource is a ComboPooledDataSource with following URL

<b:property name="jdbcUrl" value="jdbc:mysql://server:3306/db?autoReconnect=true&amp;rewriteBatchedStatements=true" /> 

My SessionFactory

<b:bean id="sessionFactory" class="class.that.extends.org.springframework.orm.hibernate3.LocalSessionFactoryBean" lazy-init="false" depends-on="dataSource">         <b:property name="dataSource" ref="phoenixDataSource" />         <b:property name="hibernateProperties">             <b:props>                 <b:prop key="hibernate.dialect">${hibernate.dialect}</b:prop> <!-- MySQL5InnoDb-->                 <b:prop key="hibernate.show_sql">${hibernate.showSQL}</b:prop>                 <b:prop key="hibernate.jdbc.batch_size">500</b:prop>                 <b:prop key="hibernate.jdbc.use_scrollable_resultset">false</b:prop>                 <b:prop key="hibernate.cache.use_second_level_cache">false</b:prop>                 <b:prop key="hibernate.cache.provider_class">org.hibernate.cache.EhCacheProvider</b:prop>                 <b:prop key="hibernate.cache.use_query_cache">false</b:prop>                 <b:prop key="hibernate.validator.apply_to_ddl">false</b:prop>                 <b:prop key="hibernate.validator.autoregister_listeners">false</b:prop>                 <b:prop key="hibernate.order_inserts">true</b:prop>                 <b:prop key="hibernate.order_updates">true</b:prop>             </b:props>         </b:property> </b:bean> 

Even if my project-wide class extends LocalSessionFactoryBean, it does not override its methods (only adds few project-wide methods)

I'm getting mad since a few days. I read a few articles and none helped me enable batch inserts. I run all of my code from JUnit tests instrumented with Spring context (so I can @Autowire my classes). All of my attempts only produce a lots of separate INSERT statements

  • https://stackoverflow.com/questions/12011343/how-do-you-enable-batch-inserts-in-hibernate
  • https://stackoverflow.com/questions/3469364/faster-way-to-batch-saves-with-hibernate
  • https://forum.hibernate.org/viewtopic.php?p=2374413
  • https://stackoverflow.com/questions/3026968/high-performance-hibernate-insert

What am I missing?

like image 300
usr-local-ΕΨΗΕΛΩΝ Avatar asked Feb 03 '14 14:02

usr-local-ΕΨΗΕΛΩΝ


1 Answers

It's likely your queries are being rewritten but you wouldn't know if by looking at the Hibernate SQL logs. Hibernate does not rewrite the insert statements - the MySQL driver rewrites them. In other words, Hibernate will send multiple insert statements to the driver, and then the driver will rewrite them. So the Hibernate logs only show you what SQL Hibernate sent to the driver, not what SQL the driver sent to the database.

You can verify this by enabling MySQL's profileSQL parameter in connection url:

<b:property name="jdbcUrl" value="jdbc:mysql://server:3306/db?autoReconnect=true&amp;rewriteBatchedStatements=true&amp;profileSQL=true" /> 

Using an example similar to yours, this is what my output looks like:

insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) Wed Feb 05 13:29:52 MST 2014 INFO: Profiler Event: [QUERY]  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) duration: 1 ms, connection-id: 81, statement-id: 33, resultset-id: 0, message: insert into Person (firstName, lastName, id) values ('person1', 'Name', 1),('person2', 'Name', 2),('person3', 'Name', 3),('person4', 'Name', 4),('person5', 'Name', 5),('person6', 'Name', 6),('person7', 'Name', 7),('person8', 'Name', 8),('person9', 'Name', 9),('person10', 'Name', 10) 

The first 10 lines are being logged by Hibernate though this not what is actually being sent to MySQL database. The last line is coming from MySQL driver and it clearly shows a single batch insert with multiple values and that is what is actually being sent to the MySQL database.

like image 135
neildo Avatar answered Sep 21 '22 09:09

neildo