Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I create a closure table using data from an adjacency list?

I have a database containing a hierarchy of categories stored using the adjacency list model.

The hierarchy is 3 levels deep (not including an imaginary root node) and contains approx 1700 nodes. Nodes in the 2nd and 3rd levels can have multiple parents. A additional table is used for the many-to-many relationship as below:

CREATE TABLE dbo.Category(
    id int IDENTITY(1,1) NOT NULL,
    name varchar(255) NOT NULL,
)

CREATE TABLE dbo.CategoryHierarchy(
    relId int IDENTITY(1,1) NOT NULL,
    catId int NOT NULL,
    parentId int NOT NULL,
)

If I move to using the transitive closure table method (for the sake of data integrity etc) is there a relatively easy query I can execute that would generate the values for the closure table? (using SQL Server 2005)

I've look through articles and presentations such as Bill Karwin's Models for hierarchical data but that only has insertion queries for a single node and it would take forever for me to create my tree like that.

Thanks.

EDIT:
RelID in the CategoryHierarchy table is purely for the sake of a primary key, it has no bearing on the node ids of the Category table.

Also by closure table, I mean a table like this:

CREATE TABLE ClosureTable (
    ancestor int NOT NULL,
    descendant int NOT NULL,
    [length] int NOT NULL,
)

Where the first two columns are a compound primary key, and are individually foreign keys to Category.id.

like image 1000
Adnan Avatar asked Sep 27 '12 12:09

Adnan


3 Answers

I was trying to figure out the same thing, but wanted it in a recursive CTE. This wouldn't have worked for you (SQL Server 2008+), but here's what I ended up with for anyone else looking.

The key is that the anchors aren't your root nodes (where parent_id IS NULL), but instead all your zero depth rows-to-be in the closure table.

Table

CREATE TABLE dbo.category (
    id         INT IDENTITY(1, 1) NOT NULL,
    parent_id  INT                    NULL
)

Data

INSERT INTO dbo.category (id, parent_id)
VALUES
    (1, NULL),
    (2, 1),
    (3, 1),
    (4, 2)

CTE

WITH category_cte AS
(
    SELECT
        id AS ancestor,
        id AS descendant,
        0  AS depth
    FROM dbo.category

    UNION ALL

    SELECT
        CTE.ancestor  AS ancestor,
        C.id          AS descendant,
        CTE.depth + 1 AS depth
    FROM dbo.category AS C
    JOIN category_cte AS CTE
        ON C.parent_id = CTE.descendant
)
SELECT * FROM category_cte

Result

ancestor descendant depth
-------- ---------- -----
1        1          0     <- anchor query
2        2          0
3        3          0
4        4          0
2        4          1     <- first recursive query
1        2          1
1        3          1
1        4          2     <- second recursive query
like image 150
Alex Guerra Avatar answered Nov 13 '22 09:11

Alex Guerra


I think I've been able to work out the solution myself.

If anyone has a better way of doing this, please comment.

IF OBJECT_ID('dbo.ClosureTable', 'U') IS NOT NULL
    DROP TABLE dbo.ClosureTable
GO

CREATE TABLE dbo.ClosureTable (
    ancestor int NOT NULL,
    descendant int NOT NULL,
    distance int NULL
)
GO

DECLARE @depth INT
SET @depth = 1

INSERT INTO dbo.ClosureTable (ancestor, descendant, distance)
SELECT catid, catid, 0 FROM dbo.Category -- insert all the self-referencing nodes

WHILE (@depth < 4) -- my tree is only 4 levels deep, i.e 0 - 3
BEGIN
    INSERT INTO dbo.ClosureTable (ancestor, descendant, distance)
    SELECT ct.ancestor, h.catid, @depth
    FROM dbo.ClosureTable ct INNER JOIN dbo.CategoryHierarchy h ON ct.descendant = h.parentid
    WHERE ct.distance = @depth - 1

    SET @depth = @depth + 1
END

Cheers :)

like image 5
Adnan Avatar answered Nov 13 '22 09:11

Adnan


  WITH Children (id, name, iteration) AS (
    SELECT id, name, 0
    FROM Category 
    --WHERE id = @folderid -- if you want a startpoint
    UNION ALL 
    SELECT b.id, b.name, a.iteration + 1
    FROM Children AS a, Category AS b, CategoryHierarchy c
    WHERE a.id = c.parentId
      AND b.id = c.catId 
  )
  SELECT id, name, iteration FROM Children

Not tested, but should work like this. At least a start how you do this fast without loops.

EDIT: I missred, not relId, it is parentId that has to link to Childrens table. But the result should be a table with all tables. So this is not what your originally wanted?

like image 3
YvesR Avatar answered Nov 13 '22 09:11

YvesR