I have an application which utilizes graph (tree-like) custom structures. The structures are not real trees, but pretty much everything is connected together. The quantity of the data is also big (millions of nodes can exist). Tree-nodes can vary in type to make it more interesting (inheritance). I don't want to alter the data-structures to accommodate the persistence storage.
I want to persist this data without too much extra work. I've goggled some options to solve this problem, but couldn't find anything that fits exactly for my needs. Possible options: serialization, databases with ORM (Hibernate?), JCR (JackRabbit?), anything else?
Performance is important, because it's a GUI based "real-time" application (no batch processing) and there could be millions of graph-nodes which should be read and written between the memory and the persisted data store.
Does anybody have experience or ideas about storing these kind of data?
As your data uses a graph data structure (basically: nodes and edges/relationships), a graph database would be a very good match. See my answer on The Next-gen Databases for some links. I'm part of the Neo4j open source graph database project, see this thread for some discussion of it. A big advantage of using Neo4j in a case like yours is that there's no trouble keeping track of persisting/activating objects or activation depth and the like. You probably wouldn't need to change the data structures in your application, but of course some extra code would be needed. The Design guide gives one example of how your code could interact with the database.
Since you indicate that there is a large quantity of data, you probably want a mechanism that you can easily bring the data in as needed. Serialization is probably not very easy to handle with large quantities of data. In order to break it up into manageable pieces you would need to either use separate files on disk or store them elsewhere. JCR (JackRabbit) is more of a content management system. Those work well for 'document' type objects. It sounds like the individual pieces of the tree you want to store may be small but together they can be large. That is not idea of a CMS.
The other option you mention, ORM, is probably your best option here. The JPA (Java Persistence API) is great for doing ORM in Java. You can write to the JPA spec and use Hibernate, Eclipselink or any other flavor of the month provider. Those will work with whatever database you want. http://java.sun.com/javaee/5/docs/api/index.html?javax/persistence/package-summary.html
The other benefit to JPA is that you can use the lazy FetchType for loading tree dependencies. This way your application only needs to load the current set of pieces it is working on. As other things are needed, the JPA layer can retrieve them from the database as needed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With