Materialized Path is a method for representing hierarchy in SQL. Each node contains the path itself and all its ancestors (grandparent/parent/self
).
The django-treebeard
implementation of MP (docs):
Each step of the path is a fixed length for consistent performance.
Each node contains depth
and numchild
fields (fast reads at minimal cost to writes).
The path field is indexed (with a standard b-tree index):
The materialized path approach makes heavy use of LIKE in your database, with clauses like WHERE path LIKE '002003%'. If you think that LIKE is too slow, you’re right, but in this case the path field is indexed in the database, and all LIKE clauses that don’t start with a % character will use the index. This is what makes the materialized path approach so fast.
Implementation of get_ancestors
(link):
Match nodes with a path that contains a subset of the current path (steplen
is the fixed length of a step).
paths = [
self.path[0:pos]
for pos in range(0, len(self.path), self.steplen)[1:]
]
return get_result_class(self.__class__).objects.filter(
path__in=paths).order_by('depth')
Implementation of get_descendants
(link):
Match nodes with a depth greater than self and a path which starts with current path.
return cls.objects.filter(
path__startswith=parent.path,
depth__gte=parent.depth
).order_by(
'path'
)
Potential downsides to this approach:
Postgres includes the ltree
extension which provides a custom GiST index (docs).
I am not clear which benefits ltree
provides over django-treebeard
's implementation. This article argues that only ltree
can answer the get_ancestors
question, but as demonstrated earlier, figuring out the ancestors (or descendants) of a node is trivial.
[As an aside, if found this Django ltree
library - https://github.com/mariocesar/django-ltree].
Both approaches use an index (django-treebeard
uses b-tree, ltree
uses a custom GiST). I am interested in understanding the implementation of the ltree
GiST and why it might be a more efficient index than a standard b-tree for this particular use case (materialized path).
Additional links
What are the options for storing hierarchical data in a relational database?
https://news.ycombinator.com/item?id=709970
ltree stores a label path. lquery represents a regular-expression-like pattern for matching ltree values. A simple word matches that label within a path. A star symbol ( * ) matches zero or more labels. These can be joined with dots to form a pattern that must match the whole label path.
The Materialized Paths pattern stores each tree node in a document; in addition to the tree node, document stores as a string the id(s) of the node's ancestors or path.
When you create a materialized view in PostgreSQL, it uses a regular database table underneath. You can create database indexes on the materialized view directly and improve performance of queries that access the materialized view.
TL;DR Reusable labels, complex search patterns, and ancestry searches against multiple descendant nodes (or a single node whose path hasn't yet been retrieved) can't be accomplished using a materialized path index.
For those interested in the gory details...
Firstly, your question is only relevant if you are not reusing any labels in your node description. If you were, the l-tree is really the only option of the two. But materialized path implementations don't typically need this, so let's put that aside.
One obvious difference will be in the flexibility in the types of searches that l-tree gives you. Consider these examples (from the ltree
docs linked in your question):
foo Match the exact label path foo
*.foo.* Match any label path containing the label foo
*.foo Match any label path whose last label is foo
The first query is obviously achievable with materialized path. The last is also achievable, where you'd adjust the query as a sibling lookup. The middle case, however, isn't directly achievable with a single index lookup. You'd either have to break this up into two queries (all descendants + all ancestors), or resort to a table scan.
And then there are really complex queries like this one (also from the docs):
Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain
A materialized path index would be useless here, and a full table scan would be required to handle this. l-tree is the only option if you want to perform this as a SARGable query.
But for the standard hierarchical operations, finding any of:
materialized path will work just as well as l-tree. Contrary to the article linked above, searching for all descendants of a common ancestor is very doable using a b-tree. The query format WHERE path LIKE 'A.%'
is SARGable provided your index is prepared properly (I had to explicitly tag my path index with varchar_pattern_ops
to get this to work).
What is missing from this list is finding all ancestors for a descendant. The query format WHERE 'A.B.C.D' LIKE path || '.%'
is unfortunately not going to use the index. One workaround that some libraries implement is to parse out the ancestor nodes from the path, and query them directly: WHERE id IN ('A', 'B', 'C')
. However, this will only work if you're targeting ancestors of a specific node whose path you have already retrieved. l-tree is going to win on this one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With