Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does an insert that groups by the primary key throw a primary key constraint violation error?

I have an insert statement that's throwing a primary key error but I don't see how I could possibly be inserting duplicate key values.

First I create a temp table with a primary key.

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED //Note: I've tried committed and uncommited, neither materially affects the behavior. See screenshots below for proof.

IF (OBJECT_ID('TEMPDB..#P')) IS NOT NULL DROP TABLE #P;

CREATE TABLE #P(idIsbn INT NOT NULL PRIMARY KEY, price SMALLMONEY, priceChangedDate DATETIME);

Then I pull prices from the Prices table, grouping by idIsbn, which is the primary key in the temp table.

INSERT  INTO #P(idIsbn, price, priceChangedDate)
SELECT  idIsbn ,
        MIN(lowestPrice) ,
        MIN(priceChangedDate)
FROM Price p
WHERE p.idMarketplace = 3100
GROUP BY p.idIsbn

I understand that grouping by idIsbn by definition makes it unique. The idIsbn in the prices table is: [idIsbn] [int] NOT NULL.

But every once in a while when I run this query I get this error:

Violation of PRIMARY KEY constraint 'PK__#P________AED35F8119E85FC5'. Cannot insert duplicate key in object 'dbo.#P'. The duplicate key value is (1447858).

NOTE: I've got a lot of questions about timing. I will select this statement, press F5, and no error will occur. Then I'll do it again, and it will fail, then I'll run it again and again and it will succeed a couple times before it fails again. I guess what I'm saying is that I can find no pattern for when it will succeed and when it won't.

How can I be inserting duplicate rows if (A) I just created the table brand new before inserting into it and (B) I'm grouping by the column designed to be the primary key?

For now, I'm solving the problem with IGNORE_DUP_KEY = ON, but I'd really like to know the root cause of the problem.

Here is what I'm actually seeing in my SSMS window. There is nothing more and nothing less:

enter image description here

@@Version is:

Microsoft SQL Server 2008 (SP3) - 10.0.5538.0 (X64) 
    Apr  3 2015 14:50:02 
    Copyright (c) 1988-2008 Microsoft Corporation
    Standard Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)

Execution Plan: enter image description here

Here is an example of what it looks like when it runs fine. Here I'm using READ COMMITTED, but it doesn't matter b/c I get the error no matter whether I read it committed or uncommited. enter image description here

Here is another example of it failing, this time w/ READ COMMITTED.

enter image description here

Also:

  • I get the same error whether I'm populating a temp table or a persistent table.
  • When I add option (maxdop 1) to the end of the insert it seems to fail every time, though I can't be exhaustively sure of that b/c I can't run it for infinity. But it seems to be the case.

Here is the definition of the price table. Table has 25M rows. 108,529 updates in the last hour.

CREATE TABLE [dbo].[Price](
    [idPrice] [int] IDENTITY(1,1) NOT NULL,
    [idIsbn] [int] NOT NULL,
    [idMarketplace] [int] NOT NULL,
    [lowestPrice] [smallmoney] NULL,
    [offers] [smallint] NULL,
    [priceDate] [smalldatetime] NOT NULL,
    [priceChangedDate] [smalldatetime] NULL,
 CONSTRAINT [pk_Price] PRIMARY KEY CLUSTERED 
(
    [idPrice] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
 CONSTRAINT [uc_idIsbn_idMarketplace] UNIQUE NONCLUSTERED 
(
    [idIsbn] ASC,
    [idMarketplace] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

And the two non-clustered indexes:

CREATE NONCLUSTERED INDEX [IX_Price_idMarketplace_INC_idIsbn_lowestPrice_priceDate] ON [dbo].[Price]
(
    [idMarketplace] ASC
)
INCLUDE (   [idIsbn],
    [lowestPrice],
    [priceDate]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [IX_Price_idMarketplace_priceChangedDate_INC_idIsbn_lowestPrice] ON [dbo].[Price]
(
    [idMarketplace] ASC,
    [priceChangedDate] ASC
)
INCLUDE (   [idIsbn],
    [lowestPrice]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
like image 350
Trevor Avatar asked Mar 30 '16 19:03

Trevor


People also ask

What does violation of primary key constraint mean?

Violation of PRIMARY KEY constraint. Cannot insert duplicate key. This error can occur when you are using either Append or Update Data Action. It happens when the records you are trying to store inside your Target Data Extension contain the same primary key value more than once.

How do you ignore violation of primary key constraint?

How do you ignore violation of primary key constraint? You could use the ignore_dup_key option with a unique index and get a similar effect. create unique index ix_Uniq on MyTable (colA asc) with ignore_dup_key. insert into MyTable select 1, 2 union select 3, 4.

What happens when a primary key constraint is added to a table?

A PRIMARY KEY constraint declares a column or a combination of columns whose values uniquely identify each row in a table. This column or the combination of columns is also known as primary key of the table. If you insert or update a row that would cause duplicate primary key, SQL engines will issue an error message.

Why declaring a primary key represents a constraint?

A primary key is a column or a set of columns that uniquely identifies each row in a table. It's called a “constraint” because it causes the system to restrict the data allowed in these column(s). In this case… be UNIQUE from all other rows in the table.


1 Answers

You hadn't supplied your table structure.

This is a repro with some assumed details that causes the problem at read committed (NB: now you have supplied the definition I can see in your case updates to the priceChangedDate column will move rows around in the IX_Price_idMarketplace_priceChangedDate_INC_idIsbn_lowestPrice index if that's the one being seeked)

Connection 1 (Set up tables)

USE tempdb;

CREATE TABLE Price
  (
     SomeKey          INT PRIMARY KEY CLUSTERED,
     idIsbn           INT IDENTITY UNIQUE,
     idMarketplace    INT DEFAULT 3100,
     lowestPrice      SMALLMONEY DEFAULT $1.23,
     priceChangedDate DATETIME DEFAULT GETDATE()
  );

CREATE NONCLUSTERED INDEX ix
  ON Price(idMarketplace)
  INCLUDE (idIsbn, lowestPrice, priceChangedDate);

INSERT INTO Price
            (SomeKey)
SELECT number
FROM   master..spt_values
WHERE  number BETWEEN 1 AND 2000
       AND type = 'P'; 

Connection 2

Concurrent DataModifications that move a row from the beginning of the seeked range (3100,1) to the end (3100,2001) and back again repeatedly.

USE tempdb;

WHILE 1=1
BEGIN
UPDATE Price SET SomeKey = 2001 WHERE SomeKey = 1
UPDATE Price SET SomeKey = 1 WHERE SomeKey = 2001
END

Connection 3 (Do the insert into a temp table with a unique constraint)

USE tempdb;

CREATE TABLE #P
  (
     idIsbn           INT NOT NULL PRIMARY KEY,
     price            SMALLMONEY,
     priceChangedDate DATETIME
  );

WHILE 1 = 1
  BEGIN
      TRUNCATE TABLE #P

      INSERT INTO #P
                  (idIsbn,
                   price,
                   priceChangedDate)
      SELECT idIsbn,
             MIN(lowestPrice),
             MIN(priceChangedDate)
      FROM   Price p
      WHERE  p.idMarketplace = 3100
      GROUP  BY p.idIsbn
  END 

enter image description here

The plan has no aggregate as there is a unique constraint on idIsbn (a unique constraint on idIsbn,idMarketplace would also work) therefore the group by can be optimised out as there are no duplicate values.

But at read committed isolation level shared row locks are released as soon as the row is read. So it is possible for a row to move places and be read a second time by the same seek or scan.

The index ix doesn't explicitly include SomeKey as a secondary key column but as it is not declared unique SQL Server silently includes the clustering key behind the scenes, hence updating that column value can move rows around in it.

like image 181
Martin Smith Avatar answered Sep 17 '22 08:09

Martin Smith