Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast Relational method of storing tree data (for instance threaded comments on articles)

I have a cms which stores comments against articles. These comments can be both threaded and non threaded. Although technically they are the same just with the reply column left blank when it's not threaded. My application works on sqlLite, MySQL and pgsql so I need fairly standard SQL.

I currently have a comment table

comment_id
article_id
user_id
comment
timestamp
thread (this is the reply column)

My question is to figure out how to best represent the threaded comments in the database. Perhaps in a separate table that supports the tree set without the content and a simple table to hold the text? Perhaps in the way it already is? Perhaps another way?

If the comments are un-threaded I can easily just order by the timestamp.

If they are threaded I sort like this

ORDER BY SUBSTRING(c.thread, 1, (LENGTH(c.thread) - 1))

As you can see from the ORDER BY, the commenting queries will not ever use an index as function based indexes only really live in Oracle. Help me have lightening fast comment pages.

like image 877
Stewart Robinson Avatar asked May 10 '09 21:05

Stewart Robinson


2 Answers

I really like how Drupal solves this problem. It assigns a thread id to each comment. This id starts at 1 for the first comment. If a reply is added to this comment, the id 1.1 is assigned to it. A reply to comment 1.1 is given the thread id 1.1.1. A sibling of comment 1.1 is given the thread id 1.2. You get the idea. Calculating these thread ids can be done easily with one query when a comment is added.

When the thread is rendered, all of the comments that belong to the thread are fetched in a single query, sorted by the thread id. This gives you the threads in the ascending order. Furthermore, using the thread id, you can find the nesting level of each comment, and indent it accordingly.

1
1.1
1.1.1
1.2
1.2.1

There are a few issues to sort out:

  • If one component of the thread id grows to 2 digits, sorting by thread id will not produce the expected order. An easy solution is ensuring that all components of a thread id are padded by zeros to have the same width.
  • Sorting by descending thread id does not produce the expected descending order.

Drupal solves the first issue in a more complicated way using a numbering system called vancode. As for the second issue, it is solved by appending a backslash (whose ASCII code is higher than digits) to thread ids when sorting by descending order. You can find more details about this implementation by checking the source code of the comments module (see the big comment before the function comment_get_thread).

like image 97
Ayman Hourieh Avatar answered Oct 12 '22 09:10

Ayman Hourieh


I know the answer is a bit late, but for tree data use a closure table, it is the proper relational way. http://www.slideshare.net/billkarwin/models-for-hierarchical-data

It describes 4 methods:

  • Adjcency list (the simple parent foreign key)
  • Path enumeration (the Drupal strategy mentioned in the accepted answer)
  • Nested sets
  • Closure table (storing ancestor/descendant facts in a separate relation [table], with a possible distance column)

The last option has advantages of easy CRUD operations compared to the rest. The cost is space, which is O(n^2) size in the number tree nodes in the worst case, but probably not so bad in practice.

like image 34
Timo Huovinen Avatar answered Oct 12 '22 09:10

Timo Huovinen