I have the responsibility of switching our code from sqlite to postgres. One of the queries I am having trouble with is copied below.
INSERT INTO group_phones(group_id, phone_name)
SELECT g.id, p.name
FROM phones AS p, groups as g
WHERE g.id IN ($add_groups) AND p.name IN ($phones);
The problem arises when there is a duplicate record. In this table the combination of both values must be unique. I have used a few plpgsql functions in other places to do update-or-insert operations, but in this case I can do several inserts at once. I am not sure how to write a stored routine for this. Thanks for all the help from all the sql gurus out there!
There are 3 challenges.
Your query has no JOIN
condition between the tables phones
and groups
, making this effectively a limited CROSS JOIN
- which you most probably do not intend. I.e. every phone that qualifies is combined with every group that qualifies. If you have 100 phones and 100 groups that's already 10,000 combinations.
Insert distinct combinations of (group_id, phone_name)
Avoid inserting rows that are already there in table group_phones
.
All things considered it could look like this:
INSERT INTO group_phones(group_id, phone_name)
SELECT i.id, i.name
FROM (
SELECT DISTINCT g.id, p.name -- get distinct combinations
FROM phones p
JOIN groups g ON ??how are p & g connected??
WHERE g.id IN ($add_groups)
AND p.name IN ($phones)
) i
LEFT JOIN group_phones gp ON (gp.group_id, gp.phone_name) = (i.id, i.name)
WHERE gp.group_id IS NULL -- avoid duping existing rows
This form minimizes the chance of a race condition with concurrent write operations. If your table has heavy concurrent write load, you may want to lock the table exclusively or use serializable transaction isolation, This safeguard against the extremely unlikely case that a row is altered by a concurrent transaction in the tiny time slot between the constraint verification (row isn't there) and the write operation in the query.
BEGIN ISOLATION LEVEL SERIALIZABLE;
INSERT ...
COMMIT;
Be prepared to repeat the transaction if it rolls back with a serialization error. For more on that topic good starting points could be this blog post by @depesz or this related question on SO.
Normally, though, you needn't even bother with any of this.
LEFT JOIN tbl ON right_col = left_col WHERE right_col IS NULL
is generally the fastest method with distinct columns in the right table. If you have dupes in the column (especially if there are many),
WHERE NOT EXISTS (SELECT 1 FROM tbl WHERE right_col = left_col)
May be faster because it can stop to scan as soon as the first row is found.
You can also use IN
, like @dezso demonstrates, but it is usually slower in PostgreSQL.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With