I'm currently facing the well-known and common Hibernate insert batch problem.
I need to save batches 5 millions of rows long. I'm first trying with a much lighter payload. Since I have to insert entities of only 2 types (first all records of type A, then all records of type B, all pointing to common type C ManyToOne
parent), I would like to take the most advantage from JDBC batch insert.
I have already read lots of documentation, but none that I have tried worked.
AUTO_INCREMENT
ID and I'm setting the ID with a trick: SELECT MAX(ID) FROM ENTITIES
and increment every time.hibernate.jdbc.batch_size
consistent with my application's bulk size, so I set it in the LocalSessionFactoryBean
(Spring ORM integration)Here are my entities
Common parent entity. This gets inserted first in a single transaction. I don't care about auto increment column here. Only one record per batch job
@Entity @Table(...) @SequenceGenerator(...) public class Deal { @Id @Column( name = "DEAL_ID", nullable = false) @GeneratedValue( strategy = GenerationType.AUTO) protected Long id; ................ }
One of the children (let's say 2.5M records per batch)
@Entity @Table( name = "TA_LOANS") public class Loan { @Id @Column( name = "LOAN_ID", nullable = false) protected Long id; @ManyToOne( optional = false, targetEntity = Deal.class, fetch = FetchType.LAZY) @JoinColumn( name = "DEAL_ID", nullable = false) protected Deal deal; ............. }
The other children type. Let's say the other 2.5M records
@Entity @Table( name = "TA_BONDS") public class Bond { @Id @Column( name = "BOND_ID") @ManyToOne( fetch = FetchType.LAZY, optional = false, targetEntity = Deal.class) @JoinColumn( name = "DEAL_ID", nullable = false, updatable = false) protected Deal deal; }
Simplified code that inserts records
long loanIdCounter = loanDao.getMaxId(), bondIdCounter = bondDao.getMaxId(); //Perform SELECT MAX(ID) Deal deal = null; List<Bond> bondList = new ArrayList<Bond>(COMMIT_BATCH_SIZE); //500 constant value List<Loan> loanList = new ArrayList<Loan>(COMMIT_BATCH_SIZE); for (String msg: inputStreamReader) { log.debug(msg.toString()); if (this is a deal) { Deal deal = parseDeal(msg.getMessage()); deal = dealManager.persist(holder.deal); //Called in a separate transaction using Spring annotation @Transaction(REQUIRES_NEW) } else if (this is a loan) { Loan loan = parseLoan(msg.getMessage()); loan.setId(++loanIdCounter); loan.setDeal(deal); loanList.add(loan); if (loanList.size() == COMMIT_BATCH_SIZE) { loanManager.bulkInsert(loanList); //Perform a bulk insert in a single transaction, not annotated but handled manually this time loanList.clear(); } } else if (this is a bond) { Bond bond = parseBond(msg.getMessage()); bond.setId(++bondIdCounter); bond.setDeal(deal); bondList.add(bond); if (bondList.size() == COMMIT_BATCH_SIZE) //As above { bondManager.bulkInsert(bondList); bondList.clear(); } } } if (!bondList.isEmpty()) bondManager.bulkInsert(bondList); if (!loanList.isEmpty()) loanManager.bulkInsert(loanList); //Flush remaining items, not important
Implementation of bulkInsert
:
@Override public void bulkInsert(Collection<Bond> bonds) { // StatelessSession session = sessionFactory.openStatelessSession(); Session session = sessionFactory.openSession(); try { Transaction t = session.beginTransaction(); try { for (Bond bond : bonds) // session.persist(bond); // session.insert(bond); session.save(bond); } catch (RuntimeException ex) { t.rollback(); } finally { t.commit(); } } finally { session.close(); } }
As you can see from comments, I have tried several combinations of stateful/stateless session
. None worked.
My dataSource
is a ComboPooledDataSource
with following URL
<b:property name="jdbcUrl" value="jdbc:mysql://server:3306/db?autoReconnect=true&rewriteBatchedStatements=true" />
My SessionFactory
<b:bean id="sessionFactory" class="class.that.extends.org.springframework.orm.hibernate3.LocalSessionFactoryBean" lazy-init="false" depends-on="dataSource"> <b:property name="dataSource" ref="phoenixDataSource" /> <b:property name="hibernateProperties"> <b:props> <b:prop key="hibernate.dialect">${hibernate.dialect}</b:prop> <!-- MySQL5InnoDb--> <b:prop key="hibernate.show_sql">${hibernate.showSQL}</b:prop> <b:prop key="hibernate.jdbc.batch_size">500</b:prop> <b:prop key="hibernate.jdbc.use_scrollable_resultset">false</b:prop> <b:prop key="hibernate.cache.use_second_level_cache">false</b:prop> <b:prop key="hibernate.cache.provider_class">org.hibernate.cache.EhCacheProvider</b:prop> <b:prop key="hibernate.cache.use_query_cache">false</b:prop> <b:prop key="hibernate.validator.apply_to_ddl">false</b:prop> <b:prop key="hibernate.validator.autoregister_listeners">false</b:prop> <b:prop key="hibernate.order_inserts">true</b:prop> <b:prop key="hibernate.order_updates">true</b:prop> </b:props> </b:property> </b:bean>
Even if my project-wide class extends LocalSessionFactoryBean
, it does not override its methods (only adds few project-wide methods)
I'm getting mad since a few days. I read a few articles and none helped me enable batch inserts. I run all of my code from JUnit tests instrumented with Spring context (so I can @Autowire
my classes). All of my attempts only produce a lots of separate INSERT
statements
What am I missing?
It's likely your queries are being rewritten but you wouldn't know if by looking at the Hibernate SQL logs. Hibernate does not rewrite the insert statements - the MySQL driver rewrites them. In other words, Hibernate will send multiple insert statements to the driver, and then the driver will rewrite them. So the Hibernate logs only show you what SQL Hibernate sent to the driver, not what SQL the driver sent to the database.
You can verify this by enabling MySQL's profileSQL parameter in connection url:
<b:property name="jdbcUrl" value="jdbc:mysql://server:3306/db?autoReconnect=true&rewriteBatchedStatements=true&profileSQL=true" />
Using an example similar to yours, this is what my output looks like:
insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) Wed Feb 05 13:29:52 MST 2014 INFO: Profiler Event: [QUERY] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) duration: 1 ms, connection-id: 81, statement-id: 33, resultset-id: 0, message: insert into Person (firstName, lastName, id) values ('person1', 'Name', 1),('person2', 'Name', 2),('person3', 'Name', 3),('person4', 'Name', 4),('person5', 'Name', 5),('person6', 'Name', 6),('person7', 'Name', 7),('person8', 'Name', 8),('person9', 'Name', 9),('person10', 'Name', 10)
The first 10 lines are being logged by Hibernate though this not what is actually being sent to MySQL database. The last line is coming from MySQL driver and it clearly shows a single batch insert with multiple values and that is what is actually being sent to the MySQL database.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With