Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently set subtract a join table in PostgreSQL?

I have the following tables:

  • work_units - self explanatory
  • workers - self explanatory
  • skills - every work unit requires a number of skills if you want to work on it. Every worker is proficient in a number of skills.
  • work_units_skills - join table
  • workers_skills - join table

A worker can request the next appropriate free highest priority (whatever that means) unit of work to be assigned to her.


Currently I have:

SELECT work_units.*
FROM work_units
-- some joins
WHERE NOT EXISTS (
        SELECT skill_id
        FROM work_units_skills
        WHERE work_unit_id = work_units.id

        EXCEPT

        SELECT skill_id
        FROM workers_skills
        WHERE worker_id = 1 -- the worker id that made the request
      )
-- AND a bunch of other conditions
-- ORDER BY something complex
LIMIT 1
FOR UPDATE SKIP LOCKED;

This condition makes the query 8-10 times slower though.

Is there a better way to express that a work_units's skills should be a subset of the workers's skills or something to improve the current query?


Some more context:

  • The skills table is fairly small.
  • Both work_units and workers tend to have very few associated skills.
  • work_units_skills has index on work_unit_id.
  • I tried moving the query on workers_skills into a CTE. This gave a slight improvement (10-15%), but it's still too slow.
  • A work unit with no skill can be picked up by any user. Aka an empty set is a subset of every set.
like image 327
ndnenkov Avatar asked Nov 22 '17 17:11

ndnenkov


People also ask

Is it better to filter in join or WHERE clause?

In terms of readability though, especially in complex queries that have multiple joins, it is easier to spot join conditions when they are placed in the ON clause and filter conditions when they are placed in the WHERE clause.

Which is more efficient join or WHERE?

“Is there a performance difference between putting the JOIN conditions in the ON clause or the WHERE clause in MySQL?” No, there's no difference. The following queries are algebraically equivalent inside MySQL and will have the same execution plan.

How do I subtract two values from another table in SQL?

The Minus Operator in SQL is used with two SELECT statements. The MINUS operator is used to subtract the result set obtained by first SELECT query from the result set obtained by second SELECT query.


1 Answers

One simple speed-up would be to use EXCEPT ALL instead of EXCEPT. The latter removes duplicates, which is unnecessary here and can be slow.

An alternative that would probably be faster is to use a further NOT EXISTS instead of the EXCEPT:

...
WHERE NOT EXISTS (
        SELECT skill_id
        FROM work_units_skills wus
        WHERE work_unit_id = work_units.id
        AND NOT EXISTS (
            SELECT skill_id
            FROM workers_skills ws
            WHERE worker_id = 1 -- the worker id that made the request
              AND ws.skill_id = wus.skill_id
        )
      )

Demo

http://rextester.com/AGEIS52439 - with the the LIMIT removed for testing

like image 129
Steve Chambers Avatar answered Sep 21 '22 08:09

Steve Chambers