Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CTE vs. T-SQL loop for determining depth of object hierarchy

I have a table consisting of about 70,000 rows and two columns (both VARCHAR(16)): id and parent_id.

I'd like to populate a 'depth' column that shows how far a particular record is from the 'root' node.

e.g.

id,parent_id,depth
A,NULL,0
B,A,1
C,A,1
D,B,2
E,D,3

etc.

I started by writing a query based on this answer to a similar question:

WITH myCTE(id, depth) AS
(
    SELECT id, 0 FROM objects where id = 'A'
    UNION ALL
    SELECT objects.id, depth + 1 FROM myCTE JOIN objects ON objects.parent_id = myCTE.id
)
SELECT id, depth FROM myCTE

With my dataset (~80,000 rows) the above takes almost two hours to execute!

I then wrote my query as a loop and got far better performance:

ALTER TABLE objects ADD depth INT NULL
DECLARE @counter int
DECLARE @total int
SET @counter = 0
UPDATE objects SET depth = 0 WHERE id = 'A'

SELECT @total = COUNT(*) FROM objects WHERE depth IS NULL

WHILE (@total > 0)
BEGIN
    UPDATE objects SET depth = @counter + 1 WHERE parent_id IN (
        SELECT id FROM objects WHERE depth = @counter
    )
    SELECT @total = COUNT(*) FROM objects WHERE depth IS NULL
    SET @counter = @counter + 1
END

The above code only takes a couple of minutes (and it has the benefit of adding the results to the existing table)

My question is whether my results are typical of using a CTE for this problem or whether there is something I have overlooked that might explain it? Indexes, maybe? (I have none on the table right now)

like image 634
Catchwa Avatar asked Feb 05 '13 11:02

Catchwa


1 Answers

You would need an index on parent_id. The recursive part of a CTE will always use a nested loops join and is impervious to join hints (Results are added to a stack spool and the rows are processed one by one in LIFO order)

Without an index on parent_id it will need to scan the table multiple times on the inner side of the nested loops. Performance will degrade exponentially with number of rows.

Your query without recursion will be able to use different join types (hash or merge) that only scan the table twice for each level of recursion. Most likely hash join in this case as you have no useful indexes that would avoid a sort.

like image 69
Martin Smith Avatar answered Sep 30 '22 14:09

Martin Smith