We will migrate large amounts of data (a single type of entity) from Amazon's DynamoDB into a MySQL DB. We are using Hibernate to map this class into a mysql entity. There are around 3 million entities (excluding rows of list property). Here is our class mapping summary:
@Entity
@Table(name = "CUSTOMER")
public class Customer {
@Id
@Column(name = "id")
private String id;
//Other properties in which all of them are primitive types/String
@ElementCollection
@CollectionTable(name = "CUSTOMER_USER", joinColumns = @JoinColumn(name = "customer_id"))
@Column(name = "userId")
private List<String> users;
// CONSTRUCTORS, GETTERS, SETTERS, etc.
}
users is a list of String. We have created two mysql tables like following:
CREATE TABLE CUSTOMER(id VARCHAR(100), PRIMARY KEY(id));
CREATE TABLE CUSTOMER_USER(customer_id VARCHAR(100), userId VARCHAR(100), PRIMARY KEY(customer_id, userId), FOREIGN KEY (customer_id) REFERENCES CUSTOMER(id));
Note: We do not make hibernate generate any id value, we are assigning our IDs to Customer entities which are guaranteed to be unique.
Here is our hibernate.cfg.xml:
<hibernate-configuration>
<session-factory>
<property name="hibernate.dialect"> org.hibernate.dialect.MySQLDialect </property>
<property name="hibernate.connection.driver_class"> com.mysql.jdbc.Driver </property>
<property name="hibernate.connection.url"> jdbc:mysql://localhost/xxx </property>
<property name="hibernate.connection.username"> xxx </property>
<property name="hibernate.connection.password"> xxx </property>
<property name="hibernate.connection.provider_class">org.hibernate.c3p0.internal.C3P0ConnectionProvider</property>
<property name="hibernate.jdbc.batch_size"> 50 </property>
<property name="hibernate.cache.use_second_level_cache">false</property>
<property name="c3p0.min_size">30</property>
<property name="c3p0.max_size">70</property>
</session-factory>
</hibernate-configuration>
We are creating some number of threads each reading data from Dynamo and inserting them to our MySQl DB via Hibernate. Here is what each thread does:
// Each single thread brings resultItems from DynamoDB
Session session = factory.openSession();
Transaction tx = session.beginTransaction();
for(int i = 0; i < resultItems.size(); i++) {
Customer cust = new Customer(resultItems.get(i));
session.save(cust);
if(i % BATCH_SIZE == 0) {
session.flush();
session.clear();
}
}
tx.commit();
session.close();
We have our own performance monitoring functions and we are continuously logging the overall read/write performance. The problem is, migration starts with reading/writing 1500 items/sec (on average), but keeps getting slowed as long as number of rows in CUSTOMER and CUSTOMER_USER tables increases (after a few minutes, r/w speed was around 500 items/sec). I am not experienced on Hibernate and here are my questions:
NOTE 1 I did not write all of the properties, because the remaining ones other than list of users are all int, boolean, String, etc.
NOTE 2 All of points are tested and have no negative effect on performance. When we dont insert anything into mysql db, read speed stays stable for hours.
NOTE 3 Any recommendation/guidance about the structure of mysql tables, configuration settings, sessions/transactions, number of connection pools, batch sizes, etc. will be really helpful!
Assuming you are not doing anything else in the hibernate transaction than just inserting the data into these two tables, you can use StatelessSession session = sessionFactory.openStatelessSession(); instead of normal session which reduces the overhead of maintaining the caches. But then you will have to save the nested collection objects separately.
Refer https://docs.jboss.org/hibernate/orm/3.3/reference/en/html/batch.html
So it could be something like -
// Each single thread brings resultItems from DynamoDB
StatelessSession session = factory.openStatelessSession();
Transaction tx = session.beginTransaction();
for(int i = 0; i < resultItems.size(); i++) {
Customer cust = new Customer(resultItems.get(i));
Long id = session.save(cust); // get the generated id
// TODO: Create a list of related customer users and assign the id to all of them and then save those customer user objects in the same transaction.
if(i % BATCH_SIZE == 0) {
session.flush();
session.clear();
}
}
tx.commit();
session.close();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With