Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OutOfMemory when reading big amounts of data using hibernate

I need to export big amount of data from database. Here is classes that represents my data:

public class Product{
...

    @OneToMany
    @JoinColumn(name = "product_id")
    @Cascade({SAVE_UPDATE, DELETE_ORPHAN})
    List<ProductHtmlSource> htmlSources = new ArrayList<ProductHtmlSource>();

... }

ProductHtmlSource - contains big string inside which I actually need to export.

Since size of exported data is bigger than JVM memory I'm reading my data by chunks. Like this:

final int batchSize = 1000;      
for (int i = 0; i < 50; i++) {
  ScrollableResults iterator = getProductIterator(batchSize * i, batchSize * (i + 1));
  while (iterator.getScrollableResults().next()) {
     Product product = (Product) iterator.getScrollableResults().get(0); 
     List<String> htmls = product.getHtmlSources();
     <some processing>
  }

}

Code of getProductIterator :

public ScrollableResults getProductIterator(int offset, int limit) {
        Session session = getSession(true);
        session.setCacheMode(CacheMode.IGNORE);
        ScrollableResults iterator = session
                .createCriteria(Product.class)
                .add(Restrictions.eq("status", Product.Status.DONE))
                .setFirstResult(offset)
                .setMaxResults(limit)
                .scroll(ScrollMode.FORWARD_ONLY);
        session.flush();
        session.clear();

        return iterator;
    }

The problem is that in spite of I clearing session after reading of each data chunk Product objects accumulates somewhere and I'm get OutOfMemory exception. The problem is not in processing block of code even without it I get memory error. The size of batch also is not a problem since 1000 objects easily sit into memory.

Profiler showed that objects accumulates in org.hibernate.engine.StatefulPersistenceContext class.

The stacktrace:

Caused by: java.lang.OutOfMemoryError: Java heap space
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:518)
    at java.lang.StringBuffer.append(StringBuffer.java:307)
    at org.hibernate.type.TextType.get(TextType.java:41)
    at org.hibernate.type.NullableType.nullSafeGet(NullableType.java:163)
    at org.hibernate.type.NullableType.nullSafeGet(NullableType.java:154)
    at org.hibernate.type.AbstractType.hydrate(AbstractType.java:81)
    at org.hibernate.persister.entity.AbstractEntityPersister.hydrate(AbstractEntityPersister.java:2101)
    at org.hibernate.loader.Loader.loadFromResultSet(Loader.java:1380)
    at org.hibernate.loader.Loader.instanceNotYetLoaded(Loader.java:1308)
    at org.hibernate.loader.Loader.getRow(Loader.java:1206)
    at org.hibernate.loader.Loader.getRowFromResultSet(Loader.java:580)
    at org.hibernate.loader.Loader.doQuery(Loader.java:701)
    at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
    at org.hibernate.loader.Loader.loadCollection(Loader.java:1994)
    at org.hibernate.loader.collection.CollectionLoader.initialize(CollectionLoader.java:36)
    at org.hibernate.persister.collection.AbstractCollectionPersister.initialize(AbstractCollectionPersister.java:565)
    at org.hibernate.event.def.DefaultInitializeCollectionEventListener.onInitializeCollection(DefaultInitializeCollectionEventListener.java:63)
    at org.hibernate.impl.SessionImpl.initializeCollection(SessionImpl.java:1716)
    at org.hibernate.collection.AbstractPersistentCollection.initialize(AbstractPersistentCollection.java:344)
    at org.hibernate.collection.AbstractPersistentCollection.read(AbstractPersistentCollection.java:86)
    at org.hibernate.collection.AbstractPersistentCollection.readSize(AbstractPersistentCollection.java:109)
    at org.hibernate.collection.PersistentBag.size(PersistentBag.java:225)
    **at com.rivalwatch.plum.model.Product.getHtmlSource(Product.java:76)
    at com.rivalwatch.plum.model.Product.getHtmlSourceText(Product.java:80)
    at com.rivalwatch.plum.readers.AbstractDataReader.getData(AbstractDataReader.java:64)**
like image 663
Vladimir Avatar asked Feb 11 '10 07:02

Vladimir


2 Answers

It looks like you are calling getProductIterator() with the starting and ending row numbers, while getProductIterator() is expecting the starting row and a row count. As your "upper limit" gets higher you are reading data in bigger chunks. I think you mean to pass batchSize as the second argument to getProductIterator().

like image 150
KeithL Avatar answered Sep 29 '22 23:09

KeithL


Not a direct answer but for this kind of data manipulation, I would use the StatelessSession interface.

like image 40
Pascal Thivent Avatar answered Sep 29 '22 23:09

Pascal Thivent