Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimized SQL for tree structures

How would you get tree-structured data from a database with the best performance? For example, say you have a folder-hierarchy in a database. Where the folder-database-row has ID, Name and ParentID columns.

Would you use a special algorithm to get all the data at once, minimizing the amount of database-calls and process it in code?

Or would you use do many calls to the database and sort of get the structure done from the database directly?

Maybe there are different answers based on x amount of database-rows, hierarchy-depth or whatever?

Edit: I use Microsoft SQL Server, but answers out of other perspectives are interesting too.

like image 886
Seb Nilsson Avatar asked Nov 25 '08 13:11

Seb Nilsson


People also ask

Which database is best for tree structure?

MongoDB allows various ways to use tree data structures to model large hierarchical or nested data relationships. Presents a data model that organizes documents in a tree-like structure by storing references to "parent" nodes in "child" nodes.

How SQL query can be optimized?

SQL Query Optimization Techniques There are some useful practices to reduce the cost. But, the process of optimization is iterative. One needs to write the query, check query performance using io statistics or execution plan, and then optimize it. This cycle needs to be followed iteratively for query optimization.

What are the SQL performance tuning or optimization?

What is SQL Performance Tuning? SQL tuning is the process of improving SQL queries to accelerate your servers performance. It's general purpose is to reduce the amount of time it takes a user to receive a result after issuing a query, and to reduce the amount of resources used to process a query.


2 Answers

Out of all the ways to store a tree in a RDMS the most common are adjacency lists and nested sets. Nested sets are optimized for reads and can retrieve an entire tree in a single query. Adjacency lists are optimized for writes and can added to with in a simple query.

With adjacency lists each node a has column that refers to the parent node or the child node (other links are possible). Using that you can build the hierarchy based on parent child relationships. Unfortunately unless you restrict your tree's depth you cannot pull the whole thing in one query and reading it is usually slower than updating it.

With the nested set model the inverse is true, reading is fast and easy but updates get complex because you must maintain the numbering system. The nested set model encodes both parentage and sort order by enumerating all of the nodes using a preorder based numbering system.

I've used the nested set model and while it is complex for read optimizing a large hierarchy it is worth it. Once you do a few exercises in drawing out the tree and numbering the nodes you should get the hang of it.

My research on this method started at this article: Managing Hierarchical Data in MySQL.

like image 32
Bernard Igiri Avatar answered Sep 20 '22 02:09

Bernard Igiri


It really depends on how you are going to access the tree.

One clever technique is to give every node a string id, where the parent's id is a predictable substring of the child. For example, the parent could be '01', and the children would be '0100', '0101', '0102', etc. This way you can select an entire subtree from the database at once with:

SELECT * FROM treedata WHERE id LIKE '0101%'; 

Because the criterion is an initial substring, an index on the ID column would speed the query.

like image 81
Ned Batchelder Avatar answered Sep 19 '22 02:09

Ned Batchelder