Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django treebeard what are differences between AL, NS, MP

I'm trying to make a model for categorizing some objects.

I already tried using django-mptt to easily retrieve related categories, and now I'm searching different solutions to find the best one.

I can't find out though what are main differences between Materialized Path, Adjacency List and Nested Set. Wikipedia didn't give me a short answer, all I know is mptt is probably Nested Set...

Can anyone explain it to me in few words ?

like image 258
aherok Avatar asked Nov 05 '09 17:11

aherok


1 Answers

It's easier to explain with examples than with a few words. Consider the sample tree, storing names:

William     Jones     Blake             Adams             Tyler     Joseph     Miller     Flint 

Materialized Path means each node stores its full path encoded. For instance, you can store its index using a dot as a delimiter

Name        Path William     1 Jones       1.1 Blake       1.2 Adams       1.2.1 Tyler       1.3 Joseph      2 Miller      2.1 Flint       2.2 

So, Adams knows its full name is Wiliam Blake Adams, because it has the 1.2.1 path, pointing to the 1 node at first level — William — to the 1.2 node at level 2 — Blake — and 1.2.1 node at level 3 — Adams.

Adjacency List means the tree is stored by keeping a link to some adjacent nodes. For instance, you can store who is the parent and who is the next sibling.

Name        Parent     Next William     null       Joseph Jones       William    Blake Blake       William    Tyler Adams       Blake      null Tyler       William    null Joseph      null       null     Miller      Joseph     Flint Flint       Joseph     null 

Notice that it could be as simple as just storing the parent, if we don't need to keep the children of a node ordered. Now Adams can get all his ancestors recursively until null to find his full name.

Nested sets means you store each node with some index, usually a left and right value, assigned to each one as you traverse the tree in DFS order, so you know its descendants are within those values. Here's how the numbers would be assigned to the example tree:

  1 William 10     2 Jones 3     4 Blake 7            5 Adams 6     8 Tyler 9 11 Joseph 16     12 Miller 13      14 Flint 15 

And it would be stored as:

Name        left   right William     1      10 Jones       2      3 Blake       4      7 Adams       5      6 Tyler       8      9   Joseph      11     16 Miller      12     13 Flint       14     15 

So, now Adams can find its ancestors by querying who has a lower left AND a higher right value, and sort them by the left value.

Each model has its strengths and weaknesses. Choosing which one to use really depends on your application, the database you're using and what kind of operations you'll be doing most often. You can find a good comparison here.

The comparison mentions a fourth model that isn't very common (I don't know of any other implementation but my own) and very complicated to explain in a few words: Nested Interval via Matrix Encoding.

When you insert a new node in a nested set you have to re-enumerate everyone who is ahead of it in the traversal. The idea behind nested intervals is that there's an infinite number of rational numbers between any two integers, so you could store the new node as a fraction of its previous and next nodes. Storing and querying fractions can be troublesome, and this leads to the matrix encoding technique, which transforms those fractions in a 2x2 matrix and most operations can be done by some matrix algebra that isn't obvious at first sight but can be very efficient.

Treebeard has very straightforward implementations of each one of Materialized Path, Nested Sets and Adjacency Lists, with no redundancy. django-mptt actually uses a mix of Nested Sets and Adjacency Lists, since it also keeps a link to the parent and can use it to both query children more efficiently, and to rebuild the tree in case it gets messed up by someone.

like image 151
Pedro Werneck Avatar answered Sep 22 '22 20:09

Pedro Werneck