Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't this rule prevent duplicate key violations?

(postgresql) I was trying to COPY csv data into a table but I was getting duplicate key violation errors, and there's no way to tell COPY to ignore those, so following internet wisdom I tried adding this rule:

CREATE OR REPLACE RULE ignore_duplicate_inserts AS
   ON INSERT TO mytable
   WHERE (EXISTS ( SELECT mytable.id
           FROM mytable
          WHERE mytable.id = new.id)) DO NOTHING;

to circumvent the problem, but I still get those errors - any ideas why ?

like image 820
Hendekagon Avatar asked Mar 11 '11 06:03

Hendekagon


People also ask

How do you solve duplicate key value violates unique constraint?

As a solution for your present problem, either delete the conflicting manually inserted rows or reset the sequence to a value higher than any existing id in the table.

What does duplicate key value violates unique constraint mean?

The “duplicate key violates unique constraint” error notifies the caller that a retry is needed. This seems like an intuitive approach, but relying on this optimistic insert can quickly have a negative performance impact on your database.

What is needed for an insert on conflict update to work?

You must have INSERT privilege on a table in order to insert into it. If ON CONFLICT DO UPDATE is present, UPDATE privilege on the table is also required. If a column list is specified, you only need INSERT privilege on the listed columns.


2 Answers

Rules by default add things to the current action:

Roughly speaking, a rule causes additional commands to be executed when a given command on a given table is executed.

But an INSTEAD rule allows you to replace the action:

Alternatively, an INSTEAD rule can replace a given command by another, or cause a command not to be executed at all.

So, I think you want to specify INSTEAD:

CREATE OR REPLACE RULE ignore_duplicate_inserts AS
   ON INSERT TO mytable
   WHERE (EXISTS ( SELECT mytable.id
           FROM mytable
          WHERE mytable.id = new.id)) DO INSTEAD NOTHING;

Without the INSTEAD, your rule is essentially saying "do the INSERT and then do nothing" when you want to say "instead of the INSERT, do nothing" and, AFAIK, the DO INSTEAD NOTHING will do that.

I'm not an expert on PostgreSQL rules but I think adding the "INSTEAD" should work.

UPDATE: Thanks to araqnid we know that:

COPY FROM will invoke any triggers and check constraints on the destination table. However, it will not invoke rules

So a rule isn't going to work in this situation. However, triggers are fired during COPY FROM so you could write a BEFORE INSERT trigger that would return NULL when it detected duplicate rows:

It can return NULL to skip the operation for the current row. This instructs the executor to not perform the row-level operation that invoked the trigger (the insertion or modification of a particular table row).

That said, I think you'd be better off with araqnid's "load it all into a temporary table, clean it up, and copy it to the final destination" would be a more sensible solution for a bulk loading operation like you have.

like image 126
mu is too short Avatar answered Sep 18 '22 20:09

mu is too short


COPY FROM will not invoke rules (http://www.postgresql.org/docs/9.0/interactive/sql-copy.html#AEN58860)

My approach would be to load the CSV data into a temp table, then use an INSERT...SELECT statement to copy the data into the target table where it doesn't already exist. (If there are duplicates in the CSV data itself, remove those from the temp table first). Something like:

BEGIN;
CREATE TEMP TABLE stage_data(key_column, data_columns...) ON COMMIT DROP;
\copy stage_data from data.csv with csv header
-- prevent any other updates while we are merging input (omit this if you don't need it)
LOCK target_data IN SHARE ROW EXCLUSIVE MODE;
-- insert into target table
INSERT INTO target_data(key_column, data_columns...)
   SELECT key_column, data_columns...
   FROM stage_data
   WHERE NOT EXISTS (SELECT 1 FROM target_data
                     WHERE target_data.key_column = stage_data.key_column)
END;
like image 38
araqnid Avatar answered Sep 17 '22 20:09

araqnid