I am running a python script that inserts a large amount of data into a Postgres database, I use a single query to perform multiple row inserts: <pre class="prettyprint"><code>INSERT INTO table (col1,col2) VALUES ('v1','v2'),('v3','v4') ... etc </code></pre> I was wondering what would happen if it hits a duplicate key for the insert. Will it stop the entire query and throw an exception? Or will it merely ignore the insert of that specific row and move on?

The <code>INSERT</code> will just insert all rows and nothing special will happen, unless you have some kind of constraint disallowing duplicate / overlapping values (<code>PRIMARY KEY</code>, <code>UNIQUE</code>, <code>CHECK</code> or <code>EXCLUDE</code> constraint) - which you did not mention in your question. But that's what you are probably worried about. Assuming a <code>UNIQUE</code> or PK constraint on <code>(col1,col2)</code>, you are dealing with a textbook <code>UPSERT</code> situation. Many related questions and answers to find here. Generally, if any constraint is violated, an exception is raised which (unless trapped in subtransaction like it's possible in a procedural server-side language like plpgsql) will roll back not only the statement, but the whole transaction. <h3>Without concurrent writes</h3> I.e.: No other transactions will try to write to the same table at the same time. <ul> <li> Exclude rows that are already in the table with <code>WHERE NOT EXISTS ...</code> or any other applicable technique: </li> <li> Select rows which are not present in other table </li> <li> And don't forget to remove duplicates within the inserted set as well, which would not be excluded by the semi-anti-join <code>WHERE NOT EXISTS ...</code> </li> </ul> One technique to deal with both at once would be <code>EXCEPT</code>: <pre class="prettyprint lang-sql prettyprint-override"><code>INSERT INTO tbl (col1, col2) VALUES (text 'v1', text 'v2') -- explicit type cast may be needed in 1st row , ('v3', 'v4') , ('v3', 'v4') -- beware of dupes in source EXCEPT SELECT col1, col2 FROM tbl; </code></pre> <code>EXCEPT</code> without the key word <code>ALL</code> folds duplicate rows in the source. If you know there are no dupes, or you don't want to fold duplicates silently, use <code>EXCEPT ALL</code> (or one of the other techniques). See: <ul> <li>Using EXCEPT clause in PostgreSQL</li> </ul> Generally, if the target table is big, <code>WHERE NOT EXISTS</code> in combination with <code>DISTINCT</code> on the source will probably be faster: <pre class="prettyprint lang-sql prettyprint-override"><code>INSERT INTO tbl (col1, col2) SELECT * FROM ( SELECT DISTINCT * FROM ( VALUES (text 'v1', text'v2') , ('v3', 'v4') , ('v3', 'v4') -- dupes in source ) t(c1, c2) ) t WHERE NOT EXISTS ( SELECT FROM tbl WHERE col1 = t.c1 AND col2 = t.c2 ); </code></pre> If there can be many dupes, it pays to fold them in the source first. Else use one subquery less. Related: <ul> <li>Select rows which are not present in other table</li> </ul> <h3>With concurrent writes</h3> Use the Postgres <code>UPSERT</code> implementation <code>INSERT ... ON CONFLICT ...</code> in Postgres 9.5 or later: <pre class="prettyprint"><code>INSERT INTO tbl (col1,col2) SELECT DISTINCT * -- still can't insert the same row more than once FROM ( VALUES (text 'v1', text 'v2') , ('v3','v4') , ('v3','v4') -- you still need to fold dupes in source! ) t(c1, c2) ON CONFLICT DO NOTHING; -- ignores rows with *any* conflict! </code></pre> Further reading: <ul> <li>How to use RETURNING with ON CONFLICT in PostgreSQL?</li> <li>How do I insert a row which contains a foreign key?</li> </ul> Documentation: <ul> <li>The manual</li> <li>The commit page</li> <li>The Postgres Wiki page</li> </ul> Craig's reference answer for <code>UPSERT</code> problems: <ul> <li>How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?</li> </ul>

What happens with duplicates when inserting multiple rows?

Tags:

exception

sql

duplicates

postgresql

upsert

I am running a python script that inserts a large amount of data into a Postgres database, I use a single query to perform multiple row inserts:

Click to copy

INSERT INTO table (col1,col2) VALUES ('v1','v2'),('v3','v4') ... etc

I was wondering what would happen if it hits a duplicate key for the insert. Will it stop the entire query and throw an exception? Or will it merely ignore the insert of that specific row and move on?

403

asked Jun 16 '15 18:06

Garrigan Stafford

2 Answers

The INSERT will just insert all rows and nothing special will happen, unless you have some kind of constraint disallowing duplicate / overlapping values (PRIMARY KEY, UNIQUE, CHECK or EXCLUDE constraint) - which you did not mention in your question. But that's what you are probably worried about.

Assuming a UNIQUE or PK constraint on (col1,col2), you are dealing with a textbook UPSERT situation. Many related questions and answers to find here.

Generally, if any constraint is violated, an exception is raised which (unless trapped in subtransaction like it's possible in a procedural server-side language like plpgsql) will roll back not only the statement, but the whole transaction.

Without concurrent writes

I.e.: No other transactions will try to write to the same table at the same time.

Exclude rows that are already in the table with WHERE NOT EXISTS ... or any other applicable technique:
Select rows which are not present in other table
And don't forget to remove duplicates within the inserted set as well, which would not be excluded by the semi-anti-join WHERE NOT EXISTS ...

One technique to deal with both at once would be EXCEPT:

Click to copy

INSERT INTO tbl (col1, col2)
VALUES
  (text 'v1', text 'v2')  -- explicit type cast may be needed in 1st row
, ('v3', 'v4')
, ('v3', 'v4')  -- beware of dupes in source
EXCEPT SELECT col1, col2 FROM tbl;

EXCEPT without the key word ALL folds duplicate rows in the source. If you know there are no dupes, or you don't want to fold duplicates silently, use EXCEPT ALL (or one of the other techniques). See:

Using EXCEPT clause in PostgreSQL

Generally, if the target table is big, WHERE NOT EXISTS in combination with DISTINCT on the source will probably be faster:

Click to copy

INSERT INTO tbl (col1, col2)
SELECT *
FROM  (
   SELECT DISTINCT *
   FROM  (
       VALUES
         (text 'v1', text'v2')
       , ('v3', 'v4')
       , ('v3', 'v4')  -- dupes in source
      ) t(c1, c2)
   ) t
WHERE NOT EXISTS (
   SELECT FROM tbl
   WHERE  col1 = t.c1 AND col2 = t.c2
   );

If there can be many dupes, it pays to fold them in the source first. Else use one subquery less.

Select rows which are not present in other table

With concurrent writes

Use the Postgres UPSERT implementation INSERT ... ON CONFLICT ... in Postgres 9.5 or later:

Click to copy

INSERT INTO tbl (col1,col2)
SELECT DISTINCT *  -- still can't insert the same row more than once
FROM  (
   VALUES
     (text 'v1', text 'v2')
   , ('v3','v4')
   , ('v3','v4')  -- you still need to fold dupes in source!
  ) t(c1, c2)
ON CONFLICT DO NOTHING;  -- ignores rows with *any* conflict!

Erwin Brandstetter

Will it stop the entire query and throw an exception? Yes.

To avoid that, you can look on the following SO question here, which describes how to avoid Postgres from throwing an error for multiple inserts when some of the inserted keys already exist on the DB.

You should basically do this:

Click to copy

INSERT INTO DBtable
        (id, field1)
    SELECT 1, 'value'
    WHERE
        NOT EXISTS (
            SELECT id FROM DBtable WHERE id = 1
);

answered Sep 20 '22 02:09

Alexandros

Related questions
                            
                                sp_MSforeachdb: only include results from databases with results
                            
                                GROUP BY for ntext data
                            
                                When is a graph database (like Neo4j) not a good use? [closed]
                            
                                How can I get SQL result into a STRING variable?
                            
                                How to create report textbox aggregate expression in SQL server reporting with multiple datasets
                            
                                Query creation in Spring Data - dynamic where clause
                            
                                Achieving ROW_NUMBER / PARTITION BY in MS Access
                            
                                Partitioned table query still scanning all partitions
                            
                                Specified cast is not valid Linq Query
                            
                                RODBC sqlQuery() returns varchar(255) when it should return varchar(MAX)
                            
                                MySQL Error "There can be only one TIMESTAMP column with CURRENT_TIMESTAMP in DEFAULT clause" even though I'm doing nothing wrong
                            
                                `UPDATE` and `LIMIT` in `MySQL`
                            
                                Cannot specify a column width on data type int
                            
                                Create rule to restrict special characters in table in sql server
                            
                                how to query mysql on the current week?
                            
                                ROW_NUMBER( ) OVER in impala
                            
                                Compare row count of two tables in a single query and return boolean
                            
                                Stop at first match in a sqlite query
                            
                                how to update multiple rows in oracle
                            
                                UNION ALL two SELECTs with different column types - expected behaviour?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What happens with duplicates when inserting multiple rows?

Tags:

exception

sql

duplicates

postgresql

upsert

Garrigan Stafford

People also ask

2 Answers

Without concurrent writes

With concurrent writes

Erwin Brandstetter

Alexandros

Recent Activity

Donate For Us