Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL - "polymorphic table" vs 3 tables

I am using PostgreSQL 9.5 (but upgrade is possible to say 9.6).

I have permissions table:

CREATE TABLE public.permissions
(
  id integer NOT NULL DEFAULT nextval('permissions_id_seq'::regclass),
  item_id integer NOT NULL,
  item_type character varying NOT NULL,
  created_at timestamp without time zone NOT NULL,
  updated_at timestamp without time zone NOT NULL,
  CONSTRAINT permissions_pkey PRIMARY KEY (id)
)
-- skipping indices declaration, but they would be present
-- on item_id, item_type

And 3 tables for many-to-many associations

-companies_permissions (+indices declaration)

CREATE TABLE public.companies_permissions
(
  id integer NOT NULL DEFAULT nextval('companies_permissions_id_seq'::regclass),
  company_id integer,
  permission_id integer,
  CONSTRAINT companies_permissions_pkey PRIMARY KEY (id),
  CONSTRAINT fk_rails_462a923fa2 FOREIGN KEY (company_id)
      REFERENCES public.companies (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fk_rails_9dd0d015b9 FOREIGN KEY (permission_id)
      REFERENCES public.permissions (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)

CREATE INDEX index_companies_permissions_on_company_id
  ON public.companies_permissions
  USING btree
  (company_id);

CREATE INDEX index_companies_permissions_on_permission_id
  ON public.companies_permissions
  USING btree
  (permission_id);

CREATE UNIQUE INDEX index_companies_permissions_on_permission_id_and_company_id
  ON public.companies_permissions
  USING btree
  (permission_id, company_id);

-permissions_user_groups (+indices declaration)

CREATE TABLE public.permissions_user_groups
(
  id integer NOT NULL DEFAULT nextval('permissions_user_groups_id_seq'::regclass),
  permission_id integer,
  user_group_id integer,
  CONSTRAINT permissions_user_groups_pkey PRIMARY KEY (id),
  CONSTRAINT fk_rails_c1743245ea FOREIGN KEY (permission_id)
      REFERENCES public.permissions (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fk_rails_e966751863 FOREIGN KEY (user_group_id)
      REFERENCES public.user_groups (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)

CREATE UNIQUE INDEX index_permissions_user_groups_on_permission_and_user_group
  ON public.permissions_user_groups
  USING btree
  (permission_id, user_group_id);

CREATE INDEX index_permissions_user_groups_on_permission_id
  ON public.permissions_user_groups
  USING btree
  (permission_id);

CREATE INDEX index_permissions_user_groups_on_user_group_id
  ON public.permissions_user_groups
  USING btree
  (user_group_id);

-permissions_users (+indices declaration)

CREATE TABLE public.permissions_users
(
  id integer NOT NULL DEFAULT nextval('permissions_users_id_seq'::regclass),
  permission_id integer,
  user_id integer,
  CONSTRAINT permissions_users_pkey PRIMARY KEY (id),
  CONSTRAINT fk_rails_26289d56f4 FOREIGN KEY (user_id)
      REFERENCES public.users (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fk_rails_7ac7e9f5ad FOREIGN KEY (permission_id)
      REFERENCES public.permissions (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)

CREATE INDEX index_permissions_users_on_permission_id
  ON public.permissions_users
  USING btree
  (permission_id);

CREATE UNIQUE INDEX index_permissions_users_on_permission_id_and_user_id
  ON public.permissions_users
  USING btree
  (permission_id, user_id);

CREATE INDEX index_permissions_users_on_user_id
  ON public.permissions_users
  USING btree
  (user_id);

I will have to run SQL query like this a lot times:

SELECT
"permissions".*,
"permissions_users".*,
"companies_permissions".*,
"permissions_user_groups".* 
FROM "permissions"
LEFT OUTER JOIN
  "permissions_users" ON "permissions_users"."permission_id" = "permissions"."id"
LEFT OUTER JOIN
  "companies_permissions" ON "companies_permissions"."permission_id" = "permissions"."id"
LEFT OUTER JOIN
  "permissions_user_groups" ON "permissions_user_groups"."permission_id" = "permissions"."id"
WHERE
  (companies_permissions.company_id = <company_id> OR
  permissions_users.user_id in (<user_ids> OR NULL) OR
  permissions_user_groups.user_group_id IN (<user_group_ids> OR NULL)) AND
permissions.item_type = 'Topic' 

Let's say we have about 10000+ permissions and similar amount of records inside other tables.

Do I need to worry about performance?

I mean... I have 4 LEFT OUTER JOINs and it should return results pretty fast (say <200ms).

I was thinking about declaring 1 "polymorphic" table, something like:

CREATE TABLE public.permissables
(
  id integer NOT NULL DEFAULT nextval('permissables_id_seq'::regclass),
  permission_id integer,
  resource_id integer NOT NULL,
  resource_type character varying NOT NULL,
  created_at timestamp without time zone NOT NULL,
  updated_at timestamp without time zone NOT NULL,
  CONSTRAINT permissables_pkey PRIMARY KEY (id)
)
-- skipping indices declaration, but they would be present

Then I could run query like this:

SELECT
  permissions.*,
  permissables.*
FROM permissions
LEFT OUTER JOIN
  permissables ON permissables.permission_id = permissions.id
WHERE
  permissions.item_type = 'Topic' AND
  (permissables.owner_id IN (<user_ids>) AND permissables.owner_type = 'User') OR
  (permissables.owner_id = <company_id> AND permissables.owner_type = 'Company') OR
  (permissables.owner_id IN (<user_groups_ids>) AND permissables.owner_type = 'UserGroup')

QUESTIONS:

  1. Which options is better/faster? Maybe there is better way to do this?

a) 4 tables (permissions, companies_permissions, user_groups_permissions, users_permissions) b) 2 tables (permissions, permissables)

  1. Do I need to declare different indexes than btree on permissions.item_type ?

  2. Do I need to run a few times per day vacuum analyze for tables to make indices work (both options)?


EDIT1:

SQLFiddle examples:

  1. wildplasser suggestion (from comment), not working: http://sqlfiddle.com/#!15/9723f8/1
  2. Original query (4 tables): http://sqlfiddle.com/#!15/9723f8/2

{ I also removed backticks in wrong places thanks @wildplasser }

like image 323
nothing-special-here Avatar asked May 16 '17 21:05

nothing-special-here


2 Answers

Maybe it's an obvious answer, but I think the option with 3 tables should be just fine. SQL databases are good at doing join operations and you have 10,000 records - this is not a big amount of data at all, so I am not sure what makes you think there will be a performance problem.

With proper indexes (btree should be OK), it should work fast and actually you can go just a bit further and generate the sample data for you tables and see how your query actually works on real amount of data.

I also don't think you'll need to worry about something like running vacuum manually.

Regarding the option two, polymorphic table, it can be not very good as you now have single resource_id field which can point out to different tables which is a source of problems (for example, due to a bug you can have a record with resource_type=User and resource_id pointing to Company - table structure doesn't prevent it).

One more note: you do not tell anything about relations between User, UserGropup and Company - if they are all related too, it may be possible to fetch permissions just using user id(s), joining also gropus and companies to users.

And one more: you don't need ids in many-many tables, nothing bad happens if you have them, but it's enough to have permission_id and user_id and make them to be composite primary key.

like image 53
Boris Serebrov Avatar answered Sep 22 '22 16:09

Boris Serebrov


I'd recommend abstracting all access to your permissions system to a couple of model classes. Unfortunately, I've found that permission systems like this do sometimes end up being performance bottlenecks, and I've found that it is sometimes necessary to significantly refactor your data representation. So, my recommendation is that try to keep the permission-related queries isolated in a few classes and try to keep the interface to those classes independent of the rest of the system.

Examples of good approaches here are what you have above. You don't actually join against the topics table; you already have the topic IDs you care about when you're constructing the permissions.

Examples of bad interfaces would be class interfaces that make it easy to join the permissions tables into arbitrary other SQL.

I understand you asked the question in terms of SQL rather than a particular framework on top of SQL, but from the rails constraint names it looks like you are using such a framework, and I think taking advantage of it will be useful to your future code maintainability.

In the 10,000 rows cases, I think either approach will work fine. I'm not actually sure that the approaches will be all that different. If you think about the query plans generated, assuming you're getting a small number of rows from the table, the join might be handled with a loop against each table in exactly the same way that the or query might be handled assuming that the index is likely to return a small number of rows. I have not fed a plausible data set into Postgres to figure out whether that's what it actually does given a real data set. I have reasonably high confidence that Postgres is smart enough to do that if it makes sense to do so.

The polymorphic approach does give you a bit more control and if you run into performance problems you may want to check if moving to it will help. If you choose the polymorphic approach, I'd recommend writing code to go through and check to make sure that your data is consistent. That is, make sure that resource_type and resource_id corresponds to actual resources that exist in your system. I'd make that recommendation in any case where application concerns force you to denormalize your data such that database constraints are not sufficient to enforce consistency.

If you start running into performance problems, here are the sorts of things you may need to do in the future:

  • Create a cache in your application mapping objects (such as topics) to the set of permissions for those objects.

  • Create a cache in your application caching all the permissions a given user has (including the groups they are a member of) for the objects in your application.

  • Materializing the user group permissions. That is create a materialized view that combines the user_group permissions with the user permissions and the user group memberships.

In my experience the thing that really kills performance of permission systems is when you add something like permitting one group to be a member of another group. At that point you very quickly get to a point where you need caching or materialized views.

Unfortunately, it's really hard to give more specific advice without actually having your data and looking at real query plans and real performance. I think that if you prepare for future changes you'll be fine though.

like image 44
Sam Hartman Avatar answered Sep 25 '22 16:09

Sam Hartman