Managing hierarchies in SQL: MPTT/nested sets vs adjacency lists vs storing paths

Question

For a while now I've been wrestling with how best to handle hierarchies in SQL. Frustrated by the limitations of adjacency lists and the complexity of MPTT/nested sets, I began thinking about simply storing key paths instead, as a simple node_key/node_key/... string. I decided to compile the pros and cons of the three techniques:

Number of calls required to create/delete/move a node:

Adjacency = 1
MPTT = 3
Path = 1 (Replace old node path with new node path across all nodes that contain that path)

Number of calls required to get a tree:

Adjacency = [number of sub-levels]
MPTT = 1
Path = 1

Number of calls required to get path to a node / ancestry:

Adjacency = [number of super-levels]
MPTT = 1
Path = 0

Number of calls required to get number of subnodes:

Adjacency = [number of sub-levels]
MPTT = 0 (Can be calculated from right/left values)
Path = 1

Number of calls required to get depth of node:

Adjacency = [number of super-levels]
MPTT = 1
Path = 0

DB fields required:

Adjacency = 1 (parent)
MPTT = 3 (parent,right,left)
Path = 1 (path)

Conclusion

The stored path technique uses the same or less calls than the other techniques in every use case except one. By this analysis, storing paths is a clear winner. Not to mention, it's a lot simpler to implement, human readable, etc.

So the question is, shouldn't stored paths be considered a stronger technique than MPTT? Why are stored paths not a more commonly used technique, and why would you not use them over MPTT in a given instance?

Also, if you think this analysis is incomplete please let me know.

UPDATE:

Here are at least 2 things MPTT can do out of the box that a stored path solution won't:

Allows calculation of subnode count for each node without any additional queries (mentioned above).
Imposes an order on nodes at a given level. The other solutions are unordered.

Bill Karwin · Accepted Answer

You might also consider the Closure Table design I describe in my answer to What is the most efficient/elegant way to parse a flat table into a tree?

Calls required to create/delete/move a node:

Closure = 1

Calls required to get a tree:

Closure = 1

Calls required to get path to a node / ancestry:

Closure = 1

Calls required to get number of subnodes:

Closure = 1

Calls required to get depth of node:

Closure = 1

DB fields required:

Adjancency = 1 more field / row
Path = 1 more field / row
MPTT = 2 or 3 more fields / row
Closure = 2 or 3 fields in extra table. This table has O(n^2) rows worst case but far fewer than that in most practical cases.

There are a couple of other considerations:

Supports unlimited depth:

Adjacency = yes
MPTT = yes
Path = no
Closure = yes

Supports referential integrity:

Adjacency = yes
MPTT = no
Path = no
Closure = yes

I also cover Closure Table in my presentation Models for Hierarchical Data with SQL and PHP, and my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.

Hogan · Answer

It problem with your conclusion is that it ignores most of the issues involved in working with trees.

By reducing the validity of a technique to the "number of calls" you effectively ignore all of the issues which well understood data structures and algorithms attempt to solve; that is, fastest execution and low memory and resource foot print.

The "number of calls" to an SQL server may seem like a good metric to use ("look ma less code"), but if the result is a program which never finishes, runs slowly, or takes up to much space, it is in fact a useless metric.

By storing the path with every node you are not creating a tree data structure. Instead you are creating a list. Any operation which a tree is designed to optimize is lost.

This might be hard to see with small date sets (and in many cases of small trees a list is better), try some examples on data sets of size 500, 1000, 10k -- You will quickly see why storing the whole path is not a good idea.

Managing hierarchies in SQL: MPTT/nested sets vs adjacency lists vs storing paths

Tags:

sql

hierarchical-data

data-modeling

adjacency-list

mptt

Number of calls required to create/delete/move a node:

Number of calls required to get a tree:

Number of calls required to get path to a node / ancestry:

Number of calls required to get number of subnodes:

Number of calls required to get depth of node:

DB fields required:

Conclusion

UPDATE:

Yarin

2 Answers

Calls required to create/delete/move a node:

Calls required to get a tree:

Calls required to get path to a node / ancestry:

Calls required to get number of subnodes:

Calls required to get depth of node:

DB fields required:

Supports unlimited depth:

Supports referential integrity:

Bill Karwin

Hogan

Recent Activity

Donate For Us

Managing hierarchies in SQL: MPTT/nested sets vs adjacency lists vs storing paths

Tags:

sql

hierarchical-data

data-modeling

adjacency-list

mptt

Number of calls required to create/delete/move a node:

Number of calls required to get a tree:

Number of calls required to get path to a node / ancestry:

Number of calls required to get number of subnodes:

Number of calls required to get depth of node:

DB fields required:

Conclusion

UPDATE:

Yarin

2 Answers

Calls required to create/delete/move a node:

Calls required to get a tree:

Calls required to get path to a node / ancestry:

Calls required to get number of subnodes:

Calls required to get depth of node:

DB fields required:

Supports unlimited depth:

Supports referential integrity:

Bill Karwin

Hogan

Related questions

Recent Activity

Donate For Us