Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Blacklist / Whitelist Table Design

Tags:

sql

mysql

We have a set of users

CREATE TABLE `users` (
   `id` int(11) NOT NULL AUTO_INCREMENT,
   `email` varchar(254) NOT NULL,
   PRIMARY KEY (`id`),
   UNIQUE KEY `unique_email` (`email`)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED

Each user can have one or many domains, such as

CREATE TABLE `domains` (
   `id` int(11) NOT NULL AUTO_INCREMENT,
   `user_id` varchar(11) NOT NULL,
   `domain` varchar(254) NOT NULL,
   PRIMARY KEY (`id`),
   UNIQUE KEY `domain` (`domain`),
   CONSTRAINT `domains_user_id_fk` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED

And we have a table that has some sort of data, for this example it doesn't really matter what it contains

CREATE TABLE `some_data` (
   `id` int(11) NOT NULL AUTO_INCREMENT,
   `content` TEXT NOT NULL,
   PRIMARY KEY (`id`),
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED

We want certain elements of some_data to be accessible to only certain users or only certain domains (whitelist case). In other cases we want elements of some_data to be accessible to everyone BUT certain users or certain domains (blacklist case). Ideally we would like to retrieve the list of domains that the given element of some_data is accessible to in a single query and ideally do the reverse (list all the data the given domain has access to)

Our approach so far is a single table

CREATE TABLE `access_rules` (
   `id` int(11) NOT NULL AUTO_INCREMENT,
   `rule_type` enum('blacklist','whitelist')
   `some_data_id` int(11) NOT NULL,
   `user_id` int(11) NOT NULL,
   `domain_id` int(11) DEFAULT NULL,
   PRIMARY KEY (`id`),
   CONSTRAINT `access_rules_some_data_id_fk` FOREIGN KEY (`some_data_id`) REFERENCES `some_data` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED

The problem however is the fact that we need to query the db twice (to figure out if the given data entry is operating a blacklist or a whitelist [whitelist has higher priority]). (EDIT: it can be done in a single query) Also since the domain_id is nullable (to allow blacklisting / whitelisting an entire user) joining is not easy

The API that will use this schema is currently hit 4-5k times per second so performance matters. The users table is relatively small (50k+ rows) and the domains table is about 1.5 million entries. some_data is also relatively small (sub 100k rows)

EDIT: the question is more around semantics and best practices. With the above structure I'm confident we can make it work, but the schema "feels wrong" and I'm wondering if there is better way

like image 699
lePunk Avatar asked Nov 06 '22 15:11

lePunk


1 Answers

There are two issues to consider, normalization and management. To normalize traditionally you would need 4 tables.

Set up the 3 master tables USER, DOMAIN, OtherDATA.

Set up a child table with User_Id, Domain_Id, OtherDATA_Id, PermissionLevel

This provides the least amount of repeated data. It also makes the management possible at the user-domain level easier. You could also add a default whitelist/blacklist field at the user and domain tables. This way a script could auto populate the child table and then a manager could just go in and adjust the one value needed.

If you have a two different tables, one for white and one black list, you could get a user or domain on both lists by accident. Actually it would be 4 tables, 2 for users and 2 for domain. Management would be more complex.

like image 84
Duane Avatar answered Nov 15 '22 06:11

Duane