Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Categories and subcategories from one mysql table

So I can't wrap my head around how to get categories and subcategories out from my sql table in the simplest way possible.

This is what my table looks like:

enter image description here

and this is what I would like to output using mysql:

Would be great if someone could help me out!


1 Answers

You're using a design that the book SQL Antipatterns calls "Adjacency list". It's how most programmers choose to represent hierarchical data in a relational database because it's fairly robust, economical with storage, simple to understand and it can use foreign key constraints to prevent invalid parent child relationships from being created. However its big drawback is that you can't retrieve an entire tree or a subtree in a single query. You can only get a node's immediate children, it's immediate parent or its siblings, unless you're using a database that allows recursive querying, something that isn't standard. Recursion is only implemented by some databases and because there's no standard they all implement it differently.

The book provides a number of alternative schema for handling a hierarchical structure, the one I like the best is the Closure Table. In that approach, instead of storing the parent's row in the category table, you'd store the relationship data in a completely independent table. Your category table would have a unique category ID, category name, etc, and then your closure table would store the following information:

  • Ancestor ID: The ID of one of a given node's parents
  • Descendant ID: The ID of one of a given node's children
  • Depth: How many levels down the descendant node is from the ancestor node.

So your category table would look something like this:

CREATE TABLE categories (
  cat_id INT(11) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  cat_name VARCHAR(64),
  # Other columns...
) ENGINE=InnoDB;

and your associated closure table would look something like...

CREATE TABLE category_relations (
  ancestor_id INT(11) UNSIGNED,
  descendant_id INT(11) UNSIGNED,
  depth INT(11) UNSIGNED
  PRIMARY KEY (ancestor_id, descendant_id),
  FOREIGN KEY (ancestor_id) REFERENCES categories.cat_id,
  FOREIGN KEY (descendant_id) REFERENCES categories.cat_id
) ENGINE=InnoDB;

The combination of ancestor and descendant ID is the primary key for the closure table. The depth field will be used later for optimising certain classes of query, but we're getting ahead of ourselves.

Whenever you insert new values into your category table, you will now also always insert at least one node into the category relations table.

INSERT INTO categories (
  cat_name
) VALUES (
  'animal'
);

After getting the ID of the newly inserted node, the following query should be run too (we'll assume the new row got an ID of 1):

INSERT INTO category_relations (
  ancestor_id, 
  descendant_id, 
  depth
) VALUES (
  1,
  1,
  0
);

This is called the self-referencing row. You could do that with a trigger, or with an additional operation in your code, it's up to you.

Now we have a single "animal" node which we can treat as a root node. If we want to add a "feline" field, we'd do:

INSERT INTO categories (
  cat_name
) VALUES (
  'feline'
);

# We'll assume the new ID is 2

INSERT INTO category_relations (
  ancestor_id, 
  descendant_id, 
  depth
) VALUES (
  2,
  2,
  0
);

But how do we make "feline" a child of "animal"? We do it by inserting additional nodes into the relations table:

INSERT INTO category_relations (
  ancestor_id, 
  descendant_id, 
  depth
) VALUES (
  1,
  2,
  1
);

Now say we want to add a "house cat" and a "lion" child to "feline"

INSERT INTO categories (
  cat_name
) VALUES (
  'house cat'
);

# We'll assume the new ID is 3

INSERT INTO category_relations (
  ancestor_id, 
  descendant_id, 
  depth
) VALUES (
  3,
  3,
  0
);

INSERT INTO categories (
  cat_name
) VALUES (
  'lion'
);

# We'll assume the new ID is 4

INSERT INTO category_relations (
  ancestor_id, 
  descendant_id, 
  depth
) VALUES (
  4,
  4,
  0
);

Now we make those nodes children of the "feline" node.

INSERT INTO category_relations (
  ancestor_id, 
  descendant_id, 
  depth
) VALUES (
  2,
  3,
  1
);

INSERT INTO category_relations (
  ancestor_id, 
  descendant_id, 
  depth
) VALUES (
  2,
  4,
  1
);

Now the house cat and the lion are children of feline, but they also have to be ancestors of animal.

INSERT INTO category_relations (
  ancestor_id, 
  descendant_id, 
  depth
) VALUES (
  1,
  3,
  2
);

INSERT INTO category_relations (
  ancestor_id, 
  descendant_id, 
  depth
) VALUES (
  1,
  4,
  2
);

By this point, the table should look like:

 ancestor_id | descendant_id | depth
=============+===============+=======
           1 |             1 |     0
           2 |             2 |     0
           3 |             3 |     0
           4 |             4 |     0
           1 |             2 |     1
           2 |             3 |     1
           2 |             4 |     1
           1 |             3 |     2
           1 |             4 |     2

Now if you want to get the entire animal tree, you can so a simple query without iteration or recursion.

SELECT c.*
FROM category AS c
JOIN category_relations AS r ON c.cat_id = r.descendant_id
WHERE r.ancestor_id = 1;

This query will select the following rows in the relations table:

 ancestor_id | descendant_id | depth
=============+===============+=======
           1 |             1 |     0
           1 |             2 |     1
           1 |             3 |     2
           1 |             4 |     2

and will return the category rows:

 cat_id | cat_name
========+==========
      1 | animal
      2 | feline
      3 | house cat
      4 | lion

If we want the subtree that starts at Feline we'd just use the feline category ID as the ancestor.

SELECT c.*
FROM categories AS c
JOIN category_relations AS r ON c.cat_id = r.descendant_id
WHERE r.ancestor_id = 2;

 ancestor_id | descendant_id | depth
=============+===============+=======
           2 |             2 |     0
           2 |             3 |     1
           2 |             4 |     1

 cat_id | cat_name
========+==========
      2 | feline
      3 | house cat
      4 | lion

But what if you only want the children of a given node, and not an entire subtree? That's where the depth field comes in:

SELECT c.*
FROM categories AS c
JOIN category_relations AS r ON c.cat_id = r.descendant_id
WHERE r.ancestor_id = 1
AND r.depth = 1;

 ancestor_id | descendant_id | depth
=============+===============+=======
           1 |             2 |     1

 cat_id | cat_name
========+==========
      2 | feline

Hopefully, you're getting the idea by now. There are also more things you can do, like delete and move subtrees, and so on.

The examples here may make things seem like you have to do a lot of work to maintain the closure table, but it's possible to automate a lot of the stuff that I've done here manually with triggers, clever use of joined queries and so on, but I won't go into more detail, as both the SQL Antipatterns book covers that stuff far better than I can, and there are also plenty of closure table tutorials on the internet.

You may also think that there is going to be a lot of additional data stored versus the simple adjacency lists, and you'd be right, there is. However, storing a table of integers is usually pretty efficient in most databases and unless you're dealing with utterly huge trees you shouldn't run into any shortages caused by the closure table. The extra storage used is the price you pay for not having to go through the rigmarole of using iteration or recursion to traverse a tree.

I'd strongly recommend you look into closure tables if you're going to be dealing with an arbitrary tree structure stored in SQL. In very simple cases, the adjacency list model is fine so you shouldn't discount it entirely, but when things start getting more complex the closure table approach brings a lot of benefit.

like image 147
GordonM Avatar answered Jan 31 '26 10:01

GordonM



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!