I have a PostgreSQL table that has some fields indexed and those have to be unique in order to prevent duplicates. This is made thanks to a PLPGSQL function that inserts all the fields and catches the unique_violation exception, altough it stops inserting the records even if there's just one duplicate.
I can't make several INSERTs due to performance issues (some of these are done in hundreds), the issue is that it stops all the process even if there's just one duplicate, like in the firest two values in the following example.
CREATE OR REPLACE FUNCTION easy_import() RETURNS VOID AS
$$
BEGIN
BEGIN
INSERT INTO things ("title", "uniq1", "uniq2") VALUES
('title 1', 100, 102),
('title 2', 100, 102),
('title 3', 101, 102),
('title 4', 102, 102),
('title 5', 103, 102),
('title 6', 104, 102),
('title 7', 105, 102),
('title 8', 106, 102),
('title 9', 107, 102),
('title 10', 108, 102);
RETURN;
EXCEPTION WHEN unique_violation THEN
-- do nothing
END;
END;
$$
LANGUAGE plpgsql;
Is there a way to ignore the unique_violation just for one record and prevent it from stopping further INSERTs?
Thank you.
JOIN
for a reason. Still working on it.Plpgsql is faster, as you don't have to fetch the data, process them and then submit a new query. All the process is done internally and it is also precompiled which also boosts performance.
RETURN NEXT and RETURN QUERY do not actually return from the function — they simply append zero or more rows to the function's result set. Execution then continues with the next statement in the PL/pgSQL function. As successive RETURN NEXT or RETURN QUERY commands are executed, the result set is built up.
“lang_name” is simply the name of the procedural language. If the language is not mentioned, PostgreSQL will use the default procedural language, PL/pgSQL. $$ (double quoting) is a PostgreSQL substitute for single quotes to avoid quoting issues inside the BEGIN block.
Assuming that the unique constraint is composite around uniq1 and uniq2, this will work:
INSERT INTO things
WITH new_rows (title, uniq1, uniq2) AS (VALUES
('title 1', 100, 102),
('title 2', 100, 102),
('title 3', 101, 102),
('title 4', 102, 102),
('title 5', 103, 102),
('title 6', 104, 102),
('title 7', 105, 102),
('title 8', 106, 102),
('title 9', 107, 102),
('title 10', 108, 102)
)
SELECT
DISTINCT ON (n.uniq1, n.uniq2)
n.title, n.uniq1, n.uniq2
FROM new_rows AS n
LEFT JOIN things AS t
ON n.uniq1 = t.uniq1 AND n.uniq2 = t.uniq2
WHERE t.uniq1 IS NULL;
This may actually wind up being less performant than individual INSERT statements, but it's about the only other thing that will do the trick. Benchmark each approach and see which works best for you.
Your table is like to this:
CREATE TABLE t
(
title text,
uniq1 int not null,
uniq2 int nut null,
CONSTRAINT t_pk_u1_u2 PRIMARY KEY (uniq1,uniq2)
)
so let me add a rule to that:
CREATE OR REPLACE RULE ignore_duplicate_inserts_on_t AS ON INSERT TO t
WHERE (EXISTS ( SELECT 1 FROM t WHERE t.uniq1 = new.uniq1 and t.uniq2 = new.uniq2))
DO INSTEAD NOTHING;
and after that, you can run this query:
insert into t(title,uniq1,uniq2) values
('title 1', 100, 102),
('title 2', 100, 102),
...;
if your table be large, this way is optimal. I have had a test (for this way and the join way that mentioned above by Mr. cdhowie) on table with about 2 million rows, the result is:
Rule way (mentioned in this comment): 1400 rows per second
Join way (mentioned in above comment): 650 rows per second
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With