Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL - Best method to handle this hierarchical data?

This is a followup to:
MySQL - Is it possible to get all sub-items in a hierarchy?

I have an arbitrary-depth adjacency list model table (I am at the point that I can convert it into a nested set model.

I read the MySQL data on how to use a nested set model, though it seemed to get increasingly complex and very complex to do basic functions such as inserting, updating and deleting.

Another blog showing how to use a trigger system with the adjacency list model to keep a table of ancestors that relates each object to its ancestors.


Right now I need to be able to return a list of all children of a given node, to change or delete them. This hierarchical structure won't be changing all the time once created, but there will be a mass amount of the hierarchical structures.

The three methods I see are:

  1. Created a Stored Procedure which would do a recursive query that returns all children.

  2. Convert to Nested Set Model which would require to get into the complexities and possibly create a stored procedure to add, edit and delete in that.

  3. Create the Ancestor Table described above on insert/delete triggers to handle all of the data.

If there are other methods I'm not exploring, please let me know and I'll update this list.

like image 434
Kerry Jones Avatar asked Jun 29 '10 03:06

Kerry Jones


People also ask

What is the best database to store hierarchical data?

Document based database like MongoDB, and Redis are great for small scale, hierarchical data with a relatively small amount of children for each entry.

What type of SQL query handles hierarchical data?

A hierarchical query is a type of SQL query that handles hierarchical model data. They are special cases of more general recursive fixpoint queries, which compute transitive closures. In standard SQL:1999 hierarchical queries are implemented by way of recursive common table expressions (CTEs).

What is hierarchical data in MySQL?

Hierarchical data has a parent-child relationship that is not naturally represented in a relational database table. For our purposes, hierarchical data is a collection of data where each item has a single parent and zero or more children (with the exception of the root item, which has no parent).


4 Answers

Quassnoi has run some performance tests on the nested sets model and the adjacency list model and documented the results and recommendations in his blog post Adjacency list vs. nested sets: MySQL. The executive summary is:

  • Nested sets is faster for fetching all child nodes or all parent nodes.
  • Nested sets is a bad idea if you frequently need to update the table.

Here is the conclusion from his article:

In MySQL, the nested sets model should be preferred if the updates to the hierarhical structure are infrequent and it is affordable to lock the table for the duration of an update (which can take minutes on a long table).

This implies creating the table using MyISAM storage engine, creating the bounding box of a GEOMETRY type as described above, indexing it with a SPATIAL index and persisting the level in the table.

If the updates to the table are frequent or it is inaffordable to lock the table for a long period of time implied by an update, then the adjacency list model should be used to store the hierarchical data.

This requires creating a function to query the table.

The rest of the article shows how to define the table, implement the queries and gives performance measurements. The use of the spatial index is a clever idea to improve the performance of the nested set model that might be new to you.


If you're also considering approaches without MySQL then you might want to look at PostgreSQL which is another free and open-source database. PostgreSQL supports recursive queries in the form of recursive common table expressions which make querying heirarchical data easier than in MySQL and also give better performance. Quassnoi has also written an article Adjacency list vs. nested sets: PostgreSQL that shows the details.

While we are talking about looking at other approaches, Oracle's database is also worth a mention. Oracle also have a custom extension CONNECT BY which make querying heirarchical data very easy and fast. Quassnoi's article Adjacency list vs. nested sets: Oracle again covers the performance details. The query you need to get all children is extremely simple in this case:

SELECT *
FROM yourtable
START WITH id = 42
CONNECT BY parent = PRIOR id
like image 146
Mark Byers Avatar answered Sep 26 '22 01:09

Mark Byers


I would always go with the Nested Set for shear simplicity and convienience. I always suggest this article. It shows excelent the queries that are needed for the work with such hierachrchical data. The only disadvantage I see here is that it can get slower with inserting/updateing new records when the hierachry reached a certain level of complexity, but the reading is faster than many other solutions I hae seen.

Just to give you an example from the article above:

SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4
FROM category AS t1
LEFT JOIN category AS t2 ON t2.parent = t1.category_id
LEFT JOIN category AS t3 ON t3.parent = t2.category_id
LEFT JOIN category AS t4 ON t4.parent = t3.category_id
WHERE t1.name = 'ELECTRONICS';

+-------------+----------------------+--------------+-------+
| lev1        | lev2                 | lev3         | lev4  |
+-------------+----------------------+--------------+-------+
| ELECTRONICS | TELEVISIONS          | TUBE         | NULL  |
| ELECTRONICS | TELEVISIONS          | LCD          | NULL  |
| ELECTRONICS | TELEVISIONS          | PLASMA       | NULL  |
| ELECTRONICS | PORTABLE ELECTRONICS | MP3 PLAYERS  | FLASH |
| ELECTRONICS | PORTABLE ELECTRONICS | CD PLAYERS   | NULL  |
| ELECTRONICS | PORTABLE ELECTRONICS | 2 WAY RADIOS | NULL  |
+-------------+----------------------+--------------+-------+
6 rows in set (0.00 sec)

SQL wise, I don't think it can get any prettier and simpler ;)

I have no idea to the Stored Procedure way. But since it involces recursion (in your case), I don't know if it will be fast with many levels in the hierarchy. I assume you can give it a try.

like image 21
DrColossos Avatar answered Sep 25 '22 01:09

DrColossos


Maybe you should consider using document-oriented database like MongoDB. It could make your life a lot easier.

like image 37
quantumSoup Avatar answered Sep 27 '22 01:09

quantumSoup


When dealing with hierarchical data sets I find it best to approach it with caching in mind. One of the main benefits to this way of dealing with this issue this way is it doesn't require de-normalizing you database into something that might be harder to mutate.

Since memory heaps' (memcache,redis,etc) lookups are much faster than SQL for simple id -> data resolutions, I would use them to cache a list of the ids of direct children for each node. This way you can get decent performance via a recursive algorithm to build a complete list for any node.

To add/delete a new node, you will only need to invalidate its' direct parent cache O(1).

If that's not fast enough, you can add another layer of cache to a list of all child of a node at each node. In order for this to work with a decently mutable dataset, you should record the cache performance (ratio of fresh/cached hits) of each node and set a tolerance level for when to store the cache. This also can be stored in a memory heap since it's non-vital data.

If you use this more advanced caching model, you will need to note these complete children node lists will need to be invalidated when any of it's children are changed O(log n).

Once you have your list of children id's you can use SQL's WHERE id IN( id1, id2, .... ) syntax to query for what you want.

like image 37
Kendall Hopkins Avatar answered Sep 26 '22 01:09

Kendall Hopkins