I have a process that runs every 5 minutes and tries to insert a batch of articles into a table. The articles come from web-scraping, so there are cases in which I am trying to insert a batch that contains articles which have already been saved into the DB.
My primary key is uuid
- an MD5 hash of the article title.
Checking if an article exists in the db to filter the batch is kinda inefficient.
Is it a DB level way in Postgresql to ignore the attempts of inserting a duplicate uuid
without returning an Error?
When using Postgres if you do need writes exceeding 10,000s of INSERT s per second we turn to the Postgres COPY utility for bulk loading. COPY is capable of handling 100,000s of writes per second. Even without a sustained high write throughput COPY can be handy to quickly ingest a very large set of data.
You must have INSERT privilege on a table in order to insert into it. If ON CONFLICT DO UPDATE is present, UPDATE privilege on the table is also required. If a column list is specified, you only need INSERT privilege on the listed columns.
PostgreSQL used the OID internally as a primary key for its system tables. Typically, the INSERT statement returns OID with value 0. The count is the number of rows that the INSERT statement inserted successfully.
You could insert using the WHERE NOT EXISTS
clause.
For example, consider a test
table with a numeric id
as primary key and a textual name
.
db=> CREATE TABLE test(id BIGSERIAL PRIMARY KEY, name TEXT);
CREATE TABLE
-- Insertion will work - empty table
db=> INSERT INTO test(id, name)
SELECT 1, 'Partner number 1'
WHERE NOT EXISTS (SELECT 1,2 FROM test WHERE id=1);
INSERT 0 1
-- Insertion will NOT work - duplicate id
db=> INSERT INTO test(id, name)
SELECT 1, 'Partner number 1'
WHERE NOT EXISTS (SELECT 1,2 FROM test WHERE id=1);
INSERT 0 0
-- After two insertions, the table contains only one row
db=> SELECT * FROM test;
id | name
----+------------------
1 | Partner number 1
(1 row)
ON CONFILCT
Quoting the documentation:
ON CONFLICT
can be used to specify an alternative action to raising a unique constraint or exclusion constraint violation error.
The action can be DO NOTHING
, or a DO UPDATE
. The second approach is often referred to as Upsert - a portmanteau of Insert and Update.
Technically WHERE NOT EXISTS
is equivalent to ON CONFILCT DO NOTHING
. See the query plans for a deeper dive.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With