Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Ignore error in batch insert Postgresql

I have a process that runs every 5 minutes and tries to insert a batch of articles into a table. The articles come from web-scraping, so there are cases in which I am trying to insert a batch that contains articles which have already been saved into the DB.

My primary key is uuid - an MD5 hash of the article title.

Checking if an article exists in the db to filter the batch is kinda inefficient.

Is it a DB level way in Postgresql to ignore the attempts of inserting a duplicate uuid without returning an Error?

like image 870
Avraam Mavridis Avatar asked Nov 13 '16 10:11

Avraam Mavridis


People also ask

How many inserts can Postgres handle per second?

When using Postgres if you do need writes exceeding 10,000s of INSERT s per second we turn to the Postgres COPY utility for bulk loading. COPY is capable of handling 100,000s of writes per second. Even without a sustained high write throughput COPY can be handy to quickly ingest a very large set of data.

What is needed for an INSERT on conflict update to work?

You must have INSERT privilege on a table in order to insert into it. If ON CONFLICT DO UPDATE is present, UPDATE privilege on the table is also required. If a column list is specified, you only need INSERT privilege on the listed columns.

What does Postgres return on INSERT?

PostgreSQL used the OID internally as a primary key for its system tables. Typically, the INSERT statement returns OID with value 0. The count is the number of rows that the INSERT statement inserted successfully.


1 Answers

Solution

You could insert using the WHERE NOT EXISTS clause.

For example, consider a test table with a numeric id as primary key and a textual name.

Code

db=> CREATE TABLE test(id BIGSERIAL PRIMARY KEY, name TEXT);
CREATE TABLE

-- Insertion will work - empty table
db=> INSERT INTO test(id, name) 
     SELECT 1, 'Partner number 1' 
     WHERE NOT EXISTS (SELECT 1,2 FROM test WHERE id=1);
INSERT 0 1

-- Insertion will NOT work - duplicate id
db=> INSERT INTO test(id, name) 
     SELECT 1, 'Partner number 1' 
     WHERE NOT EXISTS (SELECT 1,2 FROM test WHERE id=1);    
INSERT 0 0

-- After two insertions, the table contains only one row
db=> SELECT * FROM test;
 id |       name
----+------------------
  1 | Partner number 1
(1 row)

Difference from ON CONFILCT

Quoting the documentation:

ON CONFLICT can be used to specify an alternative action to raising a unique constraint or exclusion constraint violation error.

The action can be DO NOTHING, or a DO UPDATE. The second approach is often referred to as Upsert - a portmanteau of Insert and Update.

Technically WHERE NOT EXISTS is equivalent to ON CONFILCT DO NOTHING. See the query plans for a deeper dive.

like image 180
Adam Matan Avatar answered Oct 07 '22 18:10

Adam Matan