Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nested Set Query to retrieve all ancestors of each node

I have a MySQL query that I thought was working fine to retrieve all the ancestors of each node, starting from the top node, down to its immediate node. However when I added a 5th level to the nested set, it broke.

Below are example tables, queries and SQL Fiddles:

Four Level Nested Set:

CREATE TABLE Tree
(title varchar(20) PRIMARY KEY,
 `tree` int,
 `left` int,
 `right` int);

INSERT Tree
VALUES
("Food", 1, 1, 18),
('Fruit', 1, 2, 11),
('Red', 1, 3, 6),
('Cherry', 1, 4, 5),
('Yellow', 1, 7, 10),
('Banana', 1, 8, 9),
('Meat', 1, 12, 17),
('Beef', 1, 13, 14),
('Pork', 1, 15, 16);

The Query:

SELECT t0.title node
      ,(SELECT GROUP_CONCAT(t2.title)
                    FROM Tree t2
                    WHERE t2.left<t0.left AND t2.right>t0.right
                    ORDER BY t2.left) ancestors
FROM Tree t0
GROUP BY t0.title;

The returned result for node Banana is Food,Fruit,Yellow - Perfect. You can see this here SQL Fiddle - 4 Levels

When I run the same query on the 5 level table below, the 5th level nodes come back in the wrong order:

CREATE TABLE Tree
(title varchar(20) PRIMARY KEY,
 `tree` int,
 `left` int,
 `right` int);

INSERT Tree
VALUES
("Food", 1, 1, 24),
('Fruit', 1, 2, 13),
('Red', 1, 3, 8),
('Cherry', 1, 4, 7),
('Cherry_pie', 1, 5, 6),
('Yellow', 1, 9, 12),
('Banana', 1, 10, 11),
('Meat', 1, 14, 23),
('Beef', 1, 15, 16),
('Pork', 1, 17, 22),
('Bacon', 1, 18, 21),
('Bacon_Sandwich', 1, 19, 20);

The returned result for Bacon_Sandwich is Bacon,Food,Meat,Pork which is not the right order, it should be Food,Meat,Pork,Bacon - You can see this here SQL Fiddle - 5 Levels

I am not sure what is happening because I don't really understand subqueries well enough. Can anyone shed any light on this?

EDIT AFTER INVESTIGATION:

Woah!! Looks like writing all this out and reading up about ordering with GROUP_CONCAT gave me some inspiration.

Adding ORDER BY to the actual GROUP_CONCAT function and removing from the end of the subquery solved the issue. I now receive Food,Meat,Pork,Bacon for the node Bacon_Sandwich

SELECT t0.title node
      ,(SELECT GROUP_CONCAT(t2.title ORDER BY t2.left)
                    FROM Tree t2
                    WHERE t2.left<t0.left AND t2.right>t0.right
                    ) ancestors
FROM Tree t0
GROUP BY t0.title;

I still have no idea why though. Having ORDER BY at the end of the subquery works for 4 levels but not for 5?!?!

If someone could explain what the issue is and why moving the ORDER BY fixes it, I'd be most grateful.

like image 283
superphonic Avatar asked Jan 08 '14 10:01

superphonic


2 Answers

First it's important to understand that you have an implicit GROUP BY

If you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.

To make the point more understandable I'll leave out subqueries and reduce the problem to the banana. Banana is the set [10, 11]. The correct sorted ancestors are those:

SELECT "banana" as node, GROUP_CONCAT(title ORDER by `left`)
  FROM Tree WHERE `left` < 10 AND `right` > 11
  GROUP BY node;

The ORDER BY must be in GROUP_CONCAT() as you want the aggregation function to sort. ORDER BY outside sorts by the aggregated results (i.e. the result of GROUP_CONCAT()). The fact that it worked until level 4 is just luck. ORDER BY has no effect on an aggregate function. You would get the same results with or without the ORDER BY:

SELECT GROUP_CONCAT(title)
  FROM Tree WHERE `left` < 10 AND `right` > 11
  /* ORDER BY `left` */

It might help to understand what SELECT GROUP_CONCAT(title ORDER BY left) FROM Tree WHERE … ORDER BY left does:

  1. Get a selection (WHERE) which results in three rows in an undefined order:

    ("Food")
    ("Yellow")
    ("Fruit")
    
  2. Aggregate the result into one row (implicit GROUP BY) in order to be able to use an aggregate function:

    (("Food","Yellow", "Fruit"))
    
  3. Fire the aggregate function (GROUP_CONCAT(title, ORDER BY link)) on it. I.e. order by link and then concatenate:

    ("Food,Fruit,Yellow")
    
  4. And now finally it sorts that result (ORDER BY). As it's only one row, sorting changes nothing.

    ("Food,Fruit,Yellow")
    
like image 129
16 revs Avatar answered Nov 18 '22 05:11

16 revs


You can get the result using JOIN or SUB-QUERY.

Using JOIN:

SELECT t0.title node, GROUP_CONCAT(t2.title ORDER BY t2.left) ancestors 
FROM Tree t0
LEFT JOIN Tree t2 ON t2.left < t0.left AND t2.right > t0.right
GROUP BY t0.title; 

Check this SQL FIDDLE DEMO

Using SUB-QUERY:

SELECT t0.title node, 
      (SELECT GROUP_CONCAT(t2.title ORDER BY t2.left)
       FROM Tree t2 WHERE t2.left<t0.left AND t2.right>t0.right) ancestors
FROM Tree t0
GROUP BY t0.title;

Check this SQL FIDDLE DEMO

OUTPUT

|           NODE |             ANCESTORS |
|----------------|-----------------------|
|          Bacon |        Food,Meat,Pork |
| Bacon_Sandwich |  Food,Meat,Pork,Bacon |
|         Banana |     Food,Fruit,Yellow |
|           Beef |             Food,Meat |
|         Cherry |        Food,Fruit,Red |
|     Cherry_pie | Food,Fruit,Red,Cherry |
|           Food |                (null) |
|          Fruit |                  Food |
|           Meat |                  Food |
|           Pork |             Food,Meat |
|            Red |            Food,Fruit |
|         Yellow |            Food,Fruit |

In your sub query you had used ORDER BY after WHERE clause which won't affect the output. By default GROUP_CONCAT() function will orders the output string in ascending order of column value. It won't consider you explicit ORDER BY clause.

If you check your output of first query which returns the data in ascending order of title column. So the returned result for node Banana is Food,Fruit,Yellow.

But in your second result for Bacon_Sandwich is Bacon,Food,Meat,Pork because in ascending order Bacon comes first than Food will come.

If you want to order the result based on left column than you have to specify ORDER BY inside the GROUP_CONCAT() function as above. Check my both queries.

I prefer that you use JOIN instead of SUB-QUERY for improving performance.

like image 28
Saharsh Shah Avatar answered Nov 18 '22 05:11

Saharsh Shah