Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to stream data to database BLOB using Hibernate (no in-memory storing in byte[])

I'm looking for a way to stream binary data to/from database. If possible, i'd like it to be done with Hibernate (in database agnostic way). All solutions I've found involve explicit or implicit loading of binary data into memory as byte[]. I need to avoid it. Let's say I want my code to be able to write to a local file a 2GB video from database (stored in BLOB column), or the other way around, using no more than 256Mb of memory. It's clearly achievable, and involves no voodoo. But I can't find a way, for now I'm trying to avoid debugging Hibernate.

Let's look at sample code (keeping in mind -Jmx=256Mb).

Entity class:

public class SimpleBean {
    private Long id;
    private Blob data;
    // ... skipping getters, setters and constructors.
}

Hibernate mapping fragment:

<class name="SimpleBean" table="SIMPLE_BEANS">
    <id name="id" column="SIMPLE_BEAN_ID">
        <generator class="increment" />
    </id>
    <property name="data" type="blob" column="DATA" />
</class>

Test code fragment:

Configuration cfg = new Configuration().configure("hibernate.cfg.xml");
ServiceRegistry serviceRegistry = new ServiceRegistryBuilder()
                                      .applySettings(cfg.getProperties())
                                      .buildServiceRegistry();

SessionFactory sessionFactory = cfg.buildSessionFactory(serviceRegistry);
Session session = sessionFactory.openSession();
session.beginTransaction();

File dataFile = new File("movie_1gb.avi");
long dataSize = dataFile.length();
InputStream dataStream = new FileInputStream(dataFile);

LobHelper lobHelper = session.getLobHelper();
Blob dataBlob = lobHelper.createBlob(dataStream, dataSize);

session.save( new SimpleBean(data) );
session.getTransaction().commit(); // Throws java.lang.OutOfMemoryError
session.close();

blobStream.close();
sessionFactory.close();

When running that snippet I get OutOfMemory exception. Looking at stack trace shows what Hibernate tries to load the stream in memory and gets OutOfMemory (as it should). Here's stack trace:

java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at org.hibernate.type.descriptor.java.DataHelper.extractBytes(DataHelper.java:183)
at org.hibernate.type.descriptor.java.BlobTypeDescriptor.unwrap(BlobTypeDescriptor.java:121)
at org.hibernate.type.descriptor.java.BlobTypeDescriptor.unwrap(BlobTypeDescriptor.java:45)
at org.hibernate.type.descriptor.sql.BlobTypeDescriptor$4$1.doBind(BlobTypeDescriptor.java:105)
at org.hibernate.type.descriptor.sql.BasicBinder.bind(BasicBinder.java:92)
at org.hibernate.type.AbstractStandardBasicType.nullSafeSet(AbstractStandardBasicType.java:305)
at org.hibernate.type.AbstractStandardBasicType.nullSafeSet(AbstractStandardBasicType.java:300)
at org.hibernate.type.AbstractSingleColumnStandardBasicType.nullSafeSet(AbstractSingleColumnStandardBasicType.java:57)
at org.hibernate.persister.entity.AbstractEntityPersister.dehydrate(AbstractEntityPersister.java:2603)
at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2857)
at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:3301)
at org.hibernate.action.internal.EntityInsertAction.execute(EntityInsertAction.java:88)
at org.hibernate.engine.spi.ActionQueue.execute(ActionQueue.java:362)
at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:354)
at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:275)
at org.hibernate.event.internal.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:326)
at org.hibernate.event.internal.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:52)
at org.hibernate.internal.SessionImpl.flush(SessionImpl.java:1214)
at org.hibernate.internal.SessionImpl.managedFlush(SessionImpl.java:403)
at org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction.beforeTransactionCommit(JdbcTransaction.java:101)
at org.hibernate.engine.transaction.spi.AbstractTransactionImpl.commit(AbstractTransactionImpl.java:175)
at ru.swemel.msgcenter.domain.SimpleBeanTest.testBasicUsage(SimpleBeanTest.java:63)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)

Used Hibernate 4.1.5.SP1. The exact question is: how to avoid loading stream into memory when storing a blob in database using Hibernate, using direct streaming instead. I'd like to avoid off topics about why one stores video in column of database instead of storing it in some content repository and linking. Please, consider it a model what is irrelevant to the question.

It seems that there might be some kind of capabilities on different dialects and Hibernate might try to load everything in memory, because underlying database doesn't support streaming blobs or something like that. If it's the case - i'd like to see some kind of comparative table between different dialects in aspect of handling blobs.

Thank you very much for your help!

like image 698
IceGlow Avatar asked Aug 12 '12 22:08

IceGlow


People also ask

What is a BLOB data type?

A BLOB is a binary large object that can hold a variable amount of data. The four BLOB types are TINYBLOB , BLOB , MEDIUMBLOB , and LONGBLOB . These differ only in the maximum length of the values they can hold. The four TEXT types are TINYTEXT , TEXT , MEDIUMTEXT , and LONGTEXT .

What is BLOB data type in SQL?

A BLOB (binary large object) is a varying-length binary string that can be up to 2,147,483,647 characters long. Like other binary types, BLOB strings are not associated with a code page. In addition, BLOB strings do not hold character data.


3 Answers

For those looking for same thing.

My bad, the code works as supposed to (streams without trying to copy to memory) for PostgreSQL (and probably lots of others). The inner work of Hibernate depends on selected dialect. The one I used in the first place overrides direct use of streams in favor of BinaryStream backed by byte[].

Also there are no problems with performance, since it loads only OID (number) in case of PostgreSQL, and probably lazy loads data in case of other dialects (including byte[] implementation). Just ran some dirty tests, no visible difference in 10000 loads of entity with and without binary data field.

Storing data in database seems to be slower than just saving it on disk as external file though. But it saves you a lot of headache when backing up, or dealing with limitations of particular file system, or concurrent updates, etc. But it's an off-topic.

like image 171
IceGlow Avatar answered Nov 13 '22 06:11

IceGlow


Your solution using Hibernate's lobHelper should work, but you may need to make sure that the use of streams is enforced. Set property hibernate.jdbc.use_streams_for_binary = true This is a system-level property, so it has to be set at startup (I defined it on the command line during testing:

java -Dhibernate.jdbc.use_streams_for_binary=true blobTest

You can prove it's changed in your code:

Object prop = props.get("hibernate.jdbc.use_streams_for_binary");
System.out.println("hibernate.jdbc.use_streams_for_binary" + "/" + prop);
like image 26
Phil Horder Avatar answered Nov 13 '22 06:11

Phil Horder


You are storing the Blob in your POJO SimpleBean. This means if the blob is larger than your heap space, anytime you work with this object or access the data field, you're going to get the OutOfMemoryError because the entire thing is loaded into memory.

I don't think there's a way to set or get a database field using a Stream in hibernate, and HQL inserts only into SELECT statements.

What you may have to do is remove the data field from the SimpleBean object so that it's not stored in memory when you load or save. But when you need to save a blob, you can use hibernate's save() to create the row, then use a jdbc PreparedStatement and the setBinaryStream() method. When you need to access the stream, you can use hibernate's load() method to get a SimpleBean object and do a jdbc select to get a ResultSet then use the getBinaryStream() method to read the blob. The docs for setBinaryStream() say:

The data will be read from the stream as needed until end-of-file is reached.

So the data won't be stored entirely in memory.

like image 28
Jon Lin Avatar answered Nov 13 '22 08:11

Jon Lin