Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient representation of Hierarchies in Hibernate

I'm having some trouble representing an object hierarchy in Hibernate. I've searched around, and haven't managed to find any examples doing this or similar - you have my apologies if this is a common question.

I have two types which I'd like to persist using Hibernate: Groups and Items.
* Groups are identified uniquely by a combination of their name and their parent.
* The groups are arranged in a number of trees, such that every Group has zero or one parent Group.
* Each Item can be a member of zero or more Groups.

Ideally, I'd like a bi-directional relationship allowing me to get:
* all Groups that an Item is a member of
* all Items that are a member of a particular Group or its descendants.
I also need to be able to traverse the Group tree from the top in order to display it on the UI.

The basic object structure would ideally look like this:

class Group {
    ...
    /** @return all items in this group and its descendants */
    Set<Item> getAllItems() { ... }

    /** @return all direct children of this group */
    Set<Group> getChildren() { ... }
    ...
}

class Item {
    ...
    /** @return all groups that this Item is a direct member of */
    Set<Group> getGroups() { ... }  
    ...
}

Originally, I had just made a simple bi-directional many-to-many relationship between Items and Groups, such that fetching all items in a group hierarchy required recursion down the tree, and fetching groups for an Item was a simple getter, i.e.:

class Group {
    ...
    private Set<Item> items;
    private Set<Group> children;
    ...
    /** @return all items in this group and its descendants */
    Set<Item> getAllItems() {
        Set<Item> allItems = new HashSet<Item>();
        allItems.addAll(this.items);
        for(Group child : this.getChildren()) {
            allItems.addAll(child.getAllItems());
        }
        return allItems;
    }

    /** @return all direct children of this group */
    Set<Group> getChildren() {
        return this.children;
    }
    ...
}

class Item {
    ...
    private Set<Group> groups;
    /** @return all groups that this Item is a direct member of */
    Set<Group> getGroups() {
        return this.groups;
    }
    ...
}

However, this resulted in multiple database requests to fetch the Items in a Group with many descendants, or for retrieving the entire Group tree to display in the UI. This seems very inefficient, especially with deeper, larger group trees. Is there a better or standard way of representing this relationship in Hibernate?

Am I doing anything obviously wrong or stupid?


My only other thought so far was this: Replace the group's id, parent and name fields with a unique "path" String which specifies the whole ancestry of a group, e.g.:
/rootGroup
/rootGroup/aChild
/rootGroup/aChild/aGrandChild

The join table between Groups and Items would then contain group_path and item_id.

This immediately solves the two issues I was suffering previously:
1. The entire group hierarchy can be fetched from the database in a single query and reconstructed in-memory.
2. To retrieve all Items in a group or its descendants, we can select from group_item where group_path='N' or group_path like 'N/%'

However, this seems to defeat the point of using Hibernate. All thoughts welcome!

like image 950
Armand Avatar asked Jan 14 '10 20:01

Armand


2 Answers

I think the real question is where do you want the performance hit. If you declare all your relationships lazy, and pull what you need, then you spread the hit over time. In many cases this means that your user (person or silicon) perceives a faster reaction time. Even if your processing the whole graph at once, you don't actually need the entire graph in memory at the same time.

On the other hand, pulling the entire graph across puts the performance hit up front while making manipulation faster.

The biggest difference between the two is your specific use case. If this is a webserver, pulling the entire graph into memory will kill the server. Most people can't handle more then 7-10 options anyway, and the entire graph can easily overwhelm the user with choices making the UI difficult to navigate.

Also remember these rules of optimization:

  1. Don't
  2. Seriously, Don't. It's cheaper to throw hardware at a performance problem then it is optimize your code to the point it's not maintainable anymore.
  3. If you think you must, find the 1 bottleneck, using a profiling tool, and fix it.
like image 140
Jim Barrows Avatar answered Nov 09 '22 02:11

Jim Barrows


In my past projects i've implemented two different but comparable solutions to this problem.

The most important (from a performance standpoint) solution was based on the fact, that i had a lot of different trees of moderate size. Instead of mapping all tree leaves using bi-directional relationships, each leave got a relationship to the trees root and a second one to their direct parent.

Using this strategy it was quite easy to identify all root nodes (FROM tree t WHERE t.parent IS NULL) and it was really easy to fetch the complete tree based on the root element (FROM tree t WHERE t.root = :root).

The relationship "Set children" was declared a @Transient attribute. Using a simple utility operation, it was quite easy to reconstruct the original tree structure at any time.

The other solution dealt with a small number of really large tree structures. But fortunately these structures had a limited depth. Instead of using just one root, i've mapped several (6 if i remember correctly) levels of "parents". This way it was easy to fetch and reconstruct complete subtrees off the database.

(Those level1 to level6 relations were mapped as "lazy many-to-one" relationships and the application was heavily dependend on the 2nd level caches.)

like image 26
Ralf Edmund Avatar answered Nov 09 '22 02:11

Ralf Edmund