Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL - Continue on unique_violation (plpgsql)

I have a PostgreSQL table that has some fields indexed and those have to be unique in order to prevent duplicates. This is made thanks to a PLPGSQL function that inserts all the fields and catches the unique_violation exception, altough it stops inserting the records even if there's just one duplicate.

I can't make several INSERTs due to performance issues (some of these are done in hundreds), the issue is that it stops all the process even if there's just one duplicate, like in the firest two values in the following example.

CREATE OR REPLACE FUNCTION easy_import() RETURNS VOID AS
  $$
  BEGIN
    BEGIN
      INSERT INTO things ("title", "uniq1", "uniq2") VALUES
      ('title 1', 100, 102),
      ('title 2', 100, 102),
      ('title 3', 101, 102),
      ('title 4', 102, 102),
      ('title 5', 103, 102),
      ('title 6', 104, 102),
      ('title 7', 105, 102),
      ('title 8', 106, 102),
      ('title 9', 107, 102),
      ('title 10', 108, 102);
      RETURN;
    EXCEPTION WHEN unique_violation THEN
      -- do nothing
    END;
  END;
  $$
  LANGUAGE plpgsql;

Is there a way to ignore the unique_violation just for one record and prevent it from stopping further INSERTs?

Thank you.

Update

  • The unique index has it on the "uniq1" and "uniq2" fields, I'm sorry about the confusion.
  • While @cdhowie's solution seems to be the best, it somehow ignores the fact that if you run the same query, it will trigger an error. It's weird, because the query does the JOIN for a reason. Still working on it.
like image 842
metrobalderas Avatar asked Jul 25 '12 23:07

metrobalderas


People also ask

Is Plpgsql fast?

Plpgsql is faster, as you don't have to fetch the data, process them and then submit a new query. All the process is done internally and it is also precompiled which also boosts performance.

What does return next do in Plpgsql?

RETURN NEXT and RETURN QUERY do not actually return from the function — they simply append zero or more rows to the function's result set. Execution then continues with the next statement in the PL/pgSQL function. As successive RETURN NEXT or RETURN QUERY commands are executed, the result set is built up.

What is do $$ in PostgreSQL?

“lang_name” is simply the name of the procedural language. If the language is not mentioned, PostgreSQL will use the default procedural language, PL/pgSQL. $$ (double quoting) is a PostgreSQL substitute for single quotes to avoid quoting issues inside the BEGIN block.


2 Answers

Assuming that the unique constraint is composite around uniq1 and uniq2, this will work:

INSERT INTO things

WITH new_rows (title, uniq1, uniq2) AS (VALUES
    ('title 1', 100, 102),
    ('title 2', 100, 102),
    ('title 3', 101, 102),
    ('title 4', 102, 102),
    ('title 5', 103, 102),
    ('title 6', 104, 102),
    ('title 7', 105, 102),
    ('title 8', 106, 102),
    ('title 9', 107, 102),
    ('title 10', 108, 102)
)

SELECT
    DISTINCT ON (n.uniq1, n.uniq2)
    n.title, n.uniq1, n.uniq2

FROM new_rows AS n

LEFT JOIN things AS t
ON n.uniq1 = t.uniq1 AND n.uniq2 = t.uniq2

WHERE t.uniq1 IS NULL;

This may actually wind up being less performant than individual INSERT statements, but it's about the only other thing that will do the trick. Benchmark each approach and see which works best for you.

like image 166
cdhowie Avatar answered Oct 09 '22 12:10

cdhowie


Your table is like to this:

CREATE TABLE t
(
  title text,
  uniq1 int not null,
  uniq2 int nut null,
  CONSTRAINT t_pk_u1_u2 PRIMARY KEY (uniq1,uniq2)
)

so let me add a rule to that:

CREATE OR REPLACE RULE ignore_duplicate_inserts_on_t AS ON INSERT TO t
   WHERE (EXISTS ( SELECT 1 FROM t WHERE t.uniq1 = new.uniq1 and t.uniq2 = new.uniq2))
   DO INSTEAD NOTHING;

and after that, you can run this query:

insert into t(title,uniq1,uniq2) values 
    ('title 1', 100, 102),
    ('title 2', 100, 102),
    ...;

if your table be large, this way is optimal. I have had a test (for this way and the join way that mentioned above by Mr. cdhowie) on table with about 2 million rows, the result is:

Rule way (mentioned in this comment): 1400 rows per second
Join way (mentioned in above comment): 650 rows per second
like image 30
Khalil Avatar answered Oct 09 '22 12:10

Khalil