Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursive group structure in MySQL

I am developing a system which needs to allow users to be placed into groups. These groups can be freely created, edited, and deleted by other privileged users in the system. That part is easy; just create a group_users table that links users into groups. (If you're a stickler for normalization then you may create a group table that just lists off groups and then have a group_users table that links them together -- that's fine too)

Here's where it gets tricky. The client wants groups to also contain groups, to arbitrary depth and with arbitrary overlap (groups can be in multiple groups, and groups can contain multiple groups). That's easy enough to store (with a group_groups table), but it's difficult to query against without some sort extension like Oracle's CONNECT BY.

This recursive hierarchy also needs to be retroactive -- meaning that if group A contains group B, and group B is modified, then group A will be modified too -- so I can't cheat and just flatten out the structure. If you don't believe me that it can't simply be flattened, consider this situation. You have a group called "cool people" which contains users 1 and 2. Someone creates a group called "REALLY cool people" which contains user 3 and contains the group "cool people". When I query against "REALLY cool people" I should conclude that users 1, 2, and 3 are in the group. Now say that someone decides that user 2 isn't a cool person anymore and removes user 2 from "cool people". After that point in time, "REALLY cool people" only contains users 1 and 3. If I had originally flattened out the structure, I wouldn't know to remove user 2 from "REALLY cool people" when I removed him from "cool people".

So a trivial flattening won't work in this scenario. Other options I've considered:

  • Doing the recursion in code.
    • Too slow for complex groups, and also requires you to then do related joins in memory rather than on the database
  • Flatten the structure out into group_users_flattened, but also maintain a group_groups table. Create a trigger for group_users_flattened on INSERT/UPDATE/DELETE that will go to the group_groups table, find all groups that contain this group, and dynamically make the appropriate changes to group_users_flattened.
    • I can imagine this working, but it seems convoluted and error-prone, and I have a feeling there's a gotcha that I'm not seeing.

Are there other ideas that I haven't considered?

like image 448
ean5533 Avatar asked Jul 29 '11 14:07

ean5533


People also ask

What is with recursive in MySQL?

A recursive CTE is a subquery which refer to itself using its own name. The recursive CTEs are defined using WITH RECURSIVE clause. There should be a terminating condition to recursive CTE. The recursive CTEs are used for series generation and traversal of hierarchical or tree-structured data.

Does MySQL support recursion?

SQL is generally poor at recursive structures, but it is now possible on MySQL to write recursive queries. Before MySQL 8.0, recursion was possible only by creating stored routines.

Does CTE work in MySQL?

Common Table Expression (CTE) is an important feature of MySQL that is used to generate a temporary result set. It can be used with any SQL statement like SELECT, INSERT, UPDATE, etc. The complicated queries can be simplified by using CTE.

What is recursive function in SQL?

A recursive query is one that is defined by a Union All with an initialization fullselect that seeds the recursion. The iterative fullselect contains a direct reference to itself in the FROM clause. There are additional restrictions as to what can be specified in the definition of a recursive query.


1 Answers

See my answer to What is the most efficient/elegant way to parse a flat table into a tree?. I describe a design I call Closure Table.

In your case, you'd have your tables Users and Groups and UserGroupMembers that is the intersection table (many-to-many) between Users and Groups.

Then you'd need another table to describe how groups are nested. Call it SubgroupPaths for instance. This records every path relating a given group to its subgroups.

CREATE TABLE SubgroupPaths (
  supergroup INT UNSIGNED NOT NULL,
  subgroup   INT UNSIGNED NOT NULL,
  pathlength SMALLINT UNSIGNED NOT NULL DEFAULT 0,
  PRIMARY KEY (supergroup, subgroup),
  FOREIGN KEY (supergroup) REFERENCES Groups(groupid),
  FOREIGN KEY (subgroup) REFERENCES Groups(groupid)
);

You may also need some permutations of compound indexes to support certain queries you'd run against this table.

This design allows you to have multiple hierarchies, so you could have group "cool people" be a descendant of each of its supergroups.

INSERT INTO Groups (groupid, groupname) VALUES
(1, 'REALLY cool people'),
(2, 'slightly cool people'),
(3, 'cool people');

INSERT INTO SubgroupPaths (supergroup, subgroup, pathlength) VALUES
(1,1,0), (2,2,0), (3,3,0), -- every group points to itself
(1,3,1), -- REALLY is parent of cool people
(2,3,1); -- slightly is also parent of cool people

Now you can get the list of all users who should be considered cool people, regardless of whether they are members of cool people, slightly cool people, or REALLY cool people. We can even throw in a DISTINCT in case some users are associated with more than one of these groups.

SELECT DISTINCT u.*
FROM SubgroupPaths AS cool
JOIN SubgroupPaths AS supercool ON cool.subgroup=supercool.subgroup
JOIN Groups AS g ON supercool.supergroup = g.groupid
JOIN UserGroupMembers AS m ON m.groupid = g.groupid
JOIN Users AS u ON u.userid = m.userid
WHERE cool.subgroup = 3;

I prefer Closure Table over the Nested Sets design suggested by other answers, because Closure Table supports referential integrity constraints, and there are some queries that are hard in Nested Sets but easier in Closure Table.

For more on Closure Table, check out my book SQL Antipatterns: Avoiding the Pitfalls of Database Programming.


Footnote to all this: be careful about violating the YAGNI principle.

I once implemented a database to store arbitrary-depth groups like this, and all the PHP code to display, report, and administer hierarchies. Also I had to clone hierarchical collections when they were used, because the hierarchy could be reorganized later, and we needed to keep historical data of how the elements in the hierarchy were used. It took weeks to code and test. And when it was all done, the user of the application never actually stored any hierarchy more one one level deep.

Some decision-makers will change their minds about the scope of requirements if you tell them how much work (i.e. budget) it will take to implement and test.

like image 176
Bill Karwin Avatar answered Oct 18 '22 23:10

Bill Karwin