I'm looking to write the simplest, most efficient SQL query to retrieve all of the events
that are related to a given user
.
Here's a simplistic representation of what my schema looks like:
A couple things to note:
users
belong to teams
via memberships
.teams
can have many collections
, apps
, and webhooks
.collections
can also have many webhooks
.webhooks
can belong to either a team
or collection
, but only one.events
can belong to any object, but only one.This seems like a fairly basic setup that most SaaS-type companies would have (eg. Slack or Stripe). Everything is "owned" by the teams, but users belong to teams and interact with the interface.
Given that setup, I'd like to create a SQL query that solves...
Find all of the events that are related (directly or indirectly) to a given user by
id
.
I can easily write queries that find either directly, or indirectly via a specific means. For example...
Find all of the events that are directly related to a user by
id
.
SELECT *
FROM events
WHERE user_id = ${id}
Or...
Find all of the events that are indirectly related to a user via their teams.
SELECT events.*
FROM events
JOIN memberships ON memberships.team_id = events.team_id
WHERE memberships.user_id = ${id}
Or even...
Find all of the events that are indirectly related to a user via any collections of their teams.
SELECT events.*
FROM events
JOIN collections ON collections.id = events.collection_id
JOIN memberships ON memberships.team_id = collections.team_id
WHERE memberships.user_id = ${id}
Webhooks get a more complex, because they can be related in two different ways...
Find all of the events that are indirectly related to a user via any webhooks of their teams or collections.
SELECT *
FROM events
WHERE webhook_id IN (
SELECT webhooks.id
FROM webhooks
JOIN memberships ON memberships.team_id = webhooks.team_id
WHERE memberships.user_id = ${id}
)
OR webhook_id IN (
SELECT webhooks.id
FROM webhooks
JOIN collections ON collections.id = webhooks.collection_id
JOIN memberships ON memberships.team_id = collections.team_id
WHERE memberships.user_id = ${id}
)
But as you can see, there are a lot of different ways for a user to be related to an event that occured, via all those paths! So when I try to a query that successfully gets all of those related events, it ends up looking like...
SELECT *
FROM events
WHERE user_id = ${id}
OR app_id IN (
SELECT apps.id
FROM apps
JOIN memberships ON memberships.team_id = apps.team_id
WHERE memberships.user_id = ${id}
)
OR collection_id IN (
SELECT collections.id
FROM collections
JOIN memberships ON memberships.team_id = collections.team_id
WHERE memberships.user_id = ${id}
)
OR memberships_id IN (
SELECT id
FROM memberships
WHERE user_id = ${id}
)
OR team_id IN (
SELECT team_id
FROM memberships
WHERE user_id = ${id}
)
OR webhook_id IN (
SELECT webhooks.id
FROM webhooks
JOIN memberships ON memberships.team_id = webhooks.team_id
WHERE memberships.user_id = ${id}
)
OR webhook_id IN (
SELECT webhooks.id
FROM webhooks
JOIN collections ON collections.id = webhooks.collection_id
JOIN memberships ON memberships.team_id = collections.team_id
WHERE memberships.user_id = ${id}
)
As with any query, the most efficient method is "it depends". There's many variables in play - the number of rows in tables, the row lengths, whether indices exist, the RAM on the server, etc etc.
The best way I can think of handling this sort of problem (thinking maintainability and a braod approach to efficiency) is by using CTEs, which allows you to create a temporary result and reuse that result throughout your query. CTEs use the WITH keyword, and essentially alias a result as a table, so that you can JOIN against it multiple times:
WITH user_memberships AS (
SELECT *
FROM memberships
WHERE user_id = ${id}
), user_apps AS (
SELECT *
FROM apps
INNER JOIN user_memberships
ON user_memberships.team_id = apps.team_id
), user_collections AS (
SELECT *
FROM collections
INNER JOIN user_memberships
ON user_memberships.team_id = collections.team_id
), user_webhooks AS (
SELECT *
FROM webhooks
LEFT OUTER JOIN user_collections ON user_collections.id = webhooks.collection_id
INNER JOIN user_memberships
ON user_memberships.team_id = webhooks.team_id
OR user_memberships.team_id = user_collections.team_id
)
SELECT events.*
FROM events
WHERE app_id IN (SELECT id FROM user_apps)
OR collection_id IN (SELECT id FROM user_collections)
OR membership_id IN (SELECT id FROM user_memberships)
OR team_id IN (SELECT team_id FROM user_memberships)
OR user_id = ${id}
OR webhook_id IN (SELECT id FROM user_webhooks)
;
The benefits of doing it this way are:
The only thing I can think of that would make it faster is to use unions.
SELECT e.*
FROM events e
WHERE user_id = ${id}
UNION
select e.*
FROM apps a
join events e on a.apps_id = e.apps_id
JOIN memberships ON memberships.team_id = apps.team_id
WHERE memberships.user_id = ${id}
UNION
select e.*
from
FROM collections c
join events e on e.collections_id = c.collections_id
JOIN memberships ON memberships.team_id = collections.team_id
WHERE memberships.user_id = ${id}
UNION
select e.*
FROM memberships m
join events e on e.memberships_id = e.memberships_id
WHERE user_id = ${id}
UNION
...;
I don't know how much control you have over your schema. If the answer is "none" then read no further. I'm not going to put much detail down here in case it's not appropriate to your situation but it looks like an ownership model to me.
i.e.
BaseTable
Id
IdOwner (FK to Id on BaseTable - very important)
Type (User = 0, App = 1, Collection = 2 etc. or use an enumeration)
App
Id (FK to BaseTable)
Collection
Id (FK to BaseTable)
Membership
Id (FK to BaseTable)
Webhooks
Id (FK to BaseTable)
Team
Id (FK to BaseTable)
Events
Id (FK to BaseTable)
Memberships
Team_Id (FK to Basetable or Team)
User_Id (FK to Basetable or Users)
Users
Id (FK to BaseTable)
Then your query becomes a recursive CTE: "Find me all objects of type Event owned by - or eventually owned by user x"
That would give you a list of id's which you'd then have to join to your Events table and you have your objects.
This sort of model does get a bit hairy because to load anything you have to join it with the base table but for this sort of nested ownership it works quite nicely.
I'd like to post this as a comment but if I do the formatting will disappear so I've posted it as an answer. If it helps and you want a bit more detail feel free to get back to me.
If I've completely missed the point and this doesn't help please don't shout at me (had that before on SO) just say "thanks, Adam, but that doesn't help" and I'll delete it.
Kind regards,
Adam.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With