What are the options for storing hierarchical data in a relational database?

People also ask

What is the best database to store hierarchical data?

Document based database like MongoDB, and Redis are great for small scale, hierarchical data with a relatively small amount of children for each entry.

What are the types of hierarchical databases?

The most popular hierarchical databases are IBM Information Management System (IMS) and RDM Mobile. Windows Registry is another example of a real-world use cases of a hierarchical database system. XML and XAML are two more popular and most widely use data storages that are based on hierarchical data model.

How data are stored in hierarchical model?

A hierarchical database model is a data model in which the data are organized into a tree-like structure. The data are stored as records which are connected to one another through links. A record is a collection of fields, with each field containing only one value.

My favorite answer is as what the first sentence in this thread suggested. Use an Adjacency List to maintain the hierarchy and use Nested Sets to query the hierarchy.

The problem up until now has been that the coversion method from an Adjacecy List to Nested Sets has been frightfully slow because most people use the extreme RBAR method known as a "Push Stack" to do the conversion and has been considered to be way to expensive to reach the Nirvana of the simplicity of maintenance by the Adjacency List and the awesome performance of Nested Sets. As a result, most people end up having to settle for one or the other especially if there are more than, say, a lousy 100,000 nodes or so. Using the push stack method can take a whole day to do the conversion on what MLM'ers would consider to be a small million node hierarchy.

I thought I'd give Celko a bit of competition by coming up with a method to convert an Adjacency List to Nested sets at speeds that just seem impossible. Here's the performance of the push stack method on my i5 laptop.

Duration for     1,000 Nodes = 00:00:00:870 
Duration for    10,000 Nodes = 00:01:01:783 (70 times slower instead of just 10)
Duration for   100,000 Nodes = 00:49:59:730 (3,446 times slower instead of just 100) 
Duration for 1,000,000 Nodes = 'Didn't even try this'

And here's the duration for the new method (with the push stack method in parenthesis).

Duration for     1,000 Nodes = 00:00:00:053 (compared to 00:00:00:870)
Duration for    10,000 Nodes = 00:00:00:323 (compared to 00:01:01:783)
Duration for   100,000 Nodes = 00:00:03:867 (compared to 00:49:59:730)
Duration for 1,000,000 Nodes = 00:00:54:283 (compared to something like 2 days!!!)

Yes, that's correct. 1 million nodes converted in less than a minute and 100,000 nodes in under 4 seconds.

You can read about the new method and get a copy of the code at the following URL. http://www.sqlservercentral.com/articles/Hierarchy/94040/

I also developed a "pre-aggregated" hierarchy using similar methods. MLM'ers and people making bills of materials will be particularly interested in this article. http://www.sqlservercentral.com/articles/T-SQL/94570/

If you do stop by to take a look at either article, jump into the "Join the discussion" link and let me know what you think.

Adjacency Model + Nested Sets Model

I went for it because I could insert new items to the tree easily (you just need a branch's id to insert a new item to it) and also query it quite fast.

+-------------+----------------------+--------+-----+-----+
| category_id | name                 | parent | lft | rgt |
+-------------+----------------------+--------+-----+-----+
|           1 | ELECTRONICS          |   NULL |   1 |  20 |
|           2 | TELEVISIONS          |      1 |   2 |   9 |
|           3 | TUBE                 |      2 |   3 |   4 |
|           4 | LCD                  |      2 |   5 |   6 |
|           5 | PLASMA               |      2 |   7 |   8 |
|           6 | PORTABLE ELECTRONICS |      1 |  10 |  19 |
|           7 | MP3 PLAYERS          |      6 |  11 |  14 |
|           8 | FLASH                |      7 |  12 |  13 |
|           9 | CD PLAYERS           |      6 |  15 |  16 |
|          10 | 2 WAY RADIOS         |      6 |  17 |  18 |
+-------------+----------------------+--------+-----+-----+

Every time you need all children of any parent you just query the parent column.
If you needed all descendants of any parent you query for items which have their lft between lft and rgt of parent.
If you needed all parents of any node up to the root of the tree, you query for items having lft lower than the node's lft and rgt bigger than the node's rgt and sort the by parent.

I needed to make accessing and querying the tree faster than inserts, that's why I chose this

The only problem is to fix the left and right columns when inserting new items. well I created a stored procedure for it and called it every time I inserted a new item which was rare in my case but it is really fast. I got the idea from the Joe Celko's book, and the stored procedure and how I came up with it is explained here in DBA SE https://dba.stackexchange.com/q/89051/41481

This is a very partial answer to your question, but I hope still useful.

Microsoft SQL Server 2008 implements two features that are extremely useful for managing hierarchical data:

the HierarchyId data type.
common table expressions, using the with keyword.

Have a look at "Model Your Data Hierarchies With SQL Server 2008" by Kent Tegels on MSDN for starts. See also my own question: Recursive same-table query in SQL Server 2008

This design was not mentioned yet:

Multiple lineage columns

Though it has limitations, if you can bear them, it's very simple and very efficient. Features:

Columns: one for each lineage level, refers to all the parents up to the root, levels below the current items' level are set to 0 (or NULL)
There is a fixed limit to how deep the hierarchy can be
Cheap ancestors, descendants, level
Cheap insert, delete, move of the leaves
Expensive insert, delete, move of the internal nodes

Here follows an example - taxonomic tree of birds so the hierarchy is Class/Order/Family/Genus/Species - species is the lowest level, 1 row = 1 taxon (which corresponds to species in the case of the leaf nodes):

CREATE TABLE `taxons` (
  `TaxonId` smallint(6) NOT NULL default '0',
  `ClassId` smallint(6) default NULL,
  `OrderId` smallint(6) default NULL,
  `FamilyId` smallint(6) default NULL,
  `GenusId` smallint(6) default NULL,
  `Name` varchar(150) NOT NULL default ''
);

and the example of the data:

+---------+---------+---------+----------+---------+-------------------------------+
| TaxonId | ClassId | OrderId | FamilyId | GenusId | Name                          |
+---------+---------+---------+----------+---------+-------------------------------+
|     254 |       0 |       0 |        0 |       0 | Aves                          |
|     255 |     254 |       0 |        0 |       0 | Gaviiformes                   |
|     256 |     254 |     255 |        0 |       0 | Gaviidae                      |
|     257 |     254 |     255 |      256 |       0 | Gavia                         |
|     258 |     254 |     255 |      256 |     257 | Gavia stellata                |
|     259 |     254 |     255 |      256 |     257 | Gavia arctica                 |
|     260 |     254 |     255 |      256 |     257 | Gavia immer                   |
|     261 |     254 |     255 |      256 |     257 | Gavia adamsii                 |
|     262 |     254 |       0 |        0 |       0 | Podicipediformes              |
|     263 |     254 |     262 |        0 |       0 | Podicipedidae                 |
|     264 |     254 |     262 |      263 |       0 | Tachybaptus                   |

This is great because this way you accomplish all the needed operations in a very easy way, as long as the internal categories don't change their level in the tree.

If your database supports arrays, you can also implement a lineage column or materialized path as an array of parent ids.

Specifically with Postgres you can then use the set operators to query the hierarchy, and get excellent performance with GIN indices. This makes finding parents, children, and depth pretty trivial in a single query. Updates are pretty manageable as well.

I have a full write up of using arrays for materialized paths if you're curious.

This is really a square peg, round hole question.

If relational databases and SQL are the only hammer you have or are willing to use, then the answers that have been posted thus far are adequate. However, why not use a tool designed to handle hierarchical data? Graph database are ideal for complex hierarchical data.

The inefficiencies of the relational model along with the complexities of any code/query solution to map a graph/hierarchical model onto a relational model is just not worth the effort when compared to the ease with which a graph database solution can solve the same problem.

Consider a Bill of Materials as a common hierarchical data structure.

class Component extends Vertex {
    long assetId;
    long partNumber;
    long material;
    long amount;
};

class PartOf extends Edge {
};

class AdjacentTo extends Edge {
};

Shortest path between two sub-assemblies: Simple graph traversal algorithm. Acceptable paths can be qualified based on criteria.

Similarity: What is the degree of similarity between two assemblies? Perform a traversal on both sub-trees computing the intersection and union of the two sub-trees. The percent similar is the intersection divided by the union.

Transitive Closure: Walk the sub-tree and sum up the field(s) of interest, e.g. "How much aluminum is in a sub-assembly?"

Yes, you can solve the problem with SQL and a relational database. However, there are much better approaches if you are willing to use the right tool for the job.

Related questions
                            
                                Function vs. Stored Procedure in SQL Server
                            
                                Search text in stored procedure in SQL Server
                            
                                Join vs. sub-query
                            
                                How can I do 'insert if not exists' in MySQL?
                            
                                Insert into a MySQL table or update if exists
                            
                                Save PL/pgSQL output from PostgreSQL to a CSV file
                            
                                When should I use CROSS APPLY over INNER JOIN?
                            
                                INNER JOIN ON vs WHERE clause
                            
                                SQL update from one Table to another based on a ID match
                            
                                Parameterize an SQL IN clause
                            
                                Get list of all tables in Oracle?
                            
                                Retrieving the last record in each group - MySQL
                            
                                How do I escape a single quote in SQL Server?
                            
                                How do I limit the number of rows returned by an Oracle query after ordering?
                            
                                Difference between JOIN and INNER JOIN
                            
                                SQL Server - Best way to get identity of inserted row?
                            
                                How can I list the tables in a SQLite database file that was opened with ATTACH?
                            
                                Using group by on multiple columns
                            
                                Can I concatenate multiple MySQL rows into one field?
                            
                                How to reset AUTO_INCREMENT in MySQL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What are the options for storing hierarchical data in a relational database?

Tags:

sql

database

tree

relational-database

hierarchical-data

People also ask

Adjacency Model + Nested Sets Model

Multiple lineage columns

Recent Activity

Donate For Us