I am putting together a regular Java EE application on jboss7 that will use JPA in the data tier. I would like to make this application such that it scales up with load. While it is pretty clear how to scale up the web tier: create more machines and throw them behind a load balancer, scaling up the data tier is less so.
I can probably cluster my database (MySQL). Stil, that leaves the JPA layer unclustered. Ideally, JPA will scale up by using in (clustered) memory caching backed by MySQL.
When I look around, all information around JPA scaling seems to be 3-4 years old. People talk about ehcache, memcached and infinispan. I am not sure if this is still current.
Can someone tell me the state of the art in Java EE clustering and scaling, especially in the data tier.
Various caching strategies are still the way to scale JPA/Hibernate (you basically named the most popular options in your question). Nothing extraordinary happend since 4-5 years in this field, as far as I know. One more option you haven't mentioned is JBoss Cache. So the Second Level Cache for JPA/Hibernate still rules in this area.
Why no progress here? My wild guess is that first of all people, who need scalable application tend to ignore JPA and Hibernate in areas where high performance is needed. Usually people go with SQL dressed in Spring Framework JDBCTemplate helpers and transaction management. Then scalability is the matter of database capabilities in this area.
The other trend is to use No-SQL databases. There is plany of solutions: MongoDB, CouchoDB, Cassandra, Redis, to name a few. These are usually Google BigTable like key-value storages (this is oversimplification, but it is more or less the idea behind that approach) and they scale as hell, if you accept their limitations (relations are no longer managed easily, etc.).
There are many solutions, the two main categories of solutions are:
EclipseLink supports data partitioning for sharding data across a set of database instances,
see: http://java-persistence-performance.blogspot.com/2011/05/data-partitioning-scaling-database.html
You can also use MySQL Cluster,
see: http://www.mysql.com/products/cluster/
Oracle TopLink Grid provides EclipseLink JPA support for integration with Oracle Coherence as a distributed cache,
see: http://www.oracle.com/technetwork/middleware/ias/tl-grid-097210.html
EclipseLink's cache supports clustering through cache coordination,
see: http://wiki.eclipse.org/EclipseLink/Examples/JPA/CacheCoordination
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With