We have a set of users
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(254) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_email` (`email`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED
Each user can have one or many domains, such as
CREATE TABLE `domains` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` varchar(11) NOT NULL,
`domain` varchar(254) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `domain` (`domain`),
CONSTRAINT `domains_user_id_fk` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED
And we have a table that has some sort of data, for this example it doesn't really matter what it contains
CREATE TABLE `some_data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`content` TEXT NOT NULL,
PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED
We want certain elements of some_data
to be accessible to only certain users
or only certain domains
(whitelist case).
In other cases we want elements of some_data
to be accessible to everyone BUT certain users
or certain domains
(blacklist case).
Ideally we would like to retrieve the list of domains that the given element of some_data
is accessible to in a single query and ideally do the reverse (list all the data the given domain has access to)
Our approach so far is a single table
CREATE TABLE `access_rules` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`rule_type` enum('blacklist','whitelist')
`some_data_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`domain_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
CONSTRAINT `access_rules_some_data_id_fk` FOREIGN KEY (`some_data_id`) REFERENCES `some_data` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED
The problem however is the fact that we need to query the db twice (to figure out if the given data entry is operating a blacklist or a whitelist [whitelist has higher priority]). (EDIT: it can be done in a single query) Also since the domain_id is nullable (to allow blacklisting / whitelisting an entire user) joining is not easy
The API that will use this schema is currently hit 4-5k times per second so performance matters.
The users
table is relatively small (50k+ rows) and the domains
table is about 1.5 million entries. some_data
is also relatively small (sub 100k rows)
EDIT: the question is more around semantics and best practices. With the above structure I'm confident we can make it work, but the schema "feels wrong" and I'm wondering if there is better way
There are two issues to consider, normalization and management. To normalize traditionally you would need 4 tables.
Set up the 3 master tables USER, DOMAIN, OtherDATA.
Set up a child table with User_Id, Domain_Id, OtherDATA_Id, PermissionLevel
This provides the least amount of repeated data. It also makes the management possible at the user-domain level easier. You could also add a default whitelist/blacklist field at the user and domain tables. This way a script could auto populate the child table and then a manager could just go in and adjust the one value needed.
If you have a two different tables, one for white and one black list, you could get a user or domain on both lists by accident. Actually it would be 4 tables, 2 for users and 2 for domain. Management would be more complex.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With