Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Libraries for hash partitioning/Sharding with JPA

My department has decided move to hash partitioning/sharding for some of our large Oracle databases. We will be splitting our entities across different schemas. I've been tasked to do a spike to evaluate the suitability of different JPA implementations for this.

The two that I've told to focus on are EclipseLink and Apache OpenJPA/Slice. We have exclusively used Hibernate in the past, but Hibernate Shards is in beta, and appears no longer be actively developed (last release was in 2007), so we are not considering it.

I will be doing my own evaluation and trial implementations, but I don't trust that I will get a good feel for the overall quality of these implementations in the time that I've been given. If you are using OpenJPA and/or EclipseLink in a production environment, especially if your database is shareded, I would like to hear about your experiences (positive and negative), your opinions about their overall quality, and if you'd make the same choice again if given the opportunity.

like image 421
Kaypro II Avatar asked Jun 08 '11 20:06

Kaypro II


2 Answers

OpenJPA Slice could be one option for JPA applications in a sharded database environment.

OpenJPA Slice is available since version 1.2 and also ships with Websphere 7.0 and later. The basic usage contract of Slice is to retain the exact same JPA based application code to work with horizontally partitioned database shards without affecting the database schema in any way. The database shards could be from different vendors.

Slice follows a policy based design that allows the user application to control which shard/slice will persist new instances, how queries would be targeted for subset of slices etc.

The basic limitation (which is typical in any sharded environment) is that the persistent domain model should adhere to a tree-constrained schema. Essentially, given an instance x stored in shard A, the persistent closure of x i.e. the set of instances directly or indirectly reachable from x, must also be stored in the same shard A. Slice computes the closure automatically when you persist x.

If an application can live with such a constraint, Slice could be a good fit.

At times, certain instances may be shared across closures e.g. Country Code or Currency Code. Slice does have a provision for replicating such 'master data'-like instances across multiple shards.

The aggregate operations (MAX, MIN, SUM) that are abelian/commutative to sharding are supported. Non-abelian aggregate such as AVG is not supported. Sorted or Top-N queries are supported as well.

More information about Slice can be found at the following references

[1] OpenJPA User Manual: http://openjpa.apache.org/builds/latest/docs/manual/manual.html#ref_guide_slice

[2] IBM Developerworks article: http://www.ibm.com/developerworks/java/library/os-openjpa/?ca=drs-

like image 128
Pinaki Poddar Avatar answered Sep 19 '22 17:09

Pinaki Poddar


EclipseLink's data partitioning support was released as part of the base product in the 2.2 release.

It supports Hash partitioning, and several other types of partitioning (value, range) as well as user defined policies. The 2.3 release also includes integrated support for Oracle RAC, UCP and WebLogic GridLink.

See, http://java-persistence-performance.blogspot.com/2011/05/data-partitioning-scaling-database.html

like image 32
James Avatar answered Sep 21 '22 17:09

James