Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sql Server Delete and Merge performance

I've table that contains some buy/sell data, with around 8M records in it:

CREATE TABLE [dbo].[Transactions](
[id] [int] IDENTITY(1,1) NOT NULL,
[itemId] [bigint] NOT NULL,
[dt] [datetime] NOT NULL,
[count] [int] NOT NULL,
[price] [float] NOT NULL,
[platform] [char](1) NOT NULL
) ON [PRIMARY]

Every X mins my program gets new transactions for each itemId and I need to update it. My first solution is two step DELETE+INSERT:

delete from Transactions where platform=@platform and itemid=@itemid
insert into Transactions (platform,itemid,dt,count,price) values (@platform,@itemid,@dt,@count,@price)
[...]
insert into Transactions (platform,itemid,dt,count,price) values (@platform,@itemid,@dt,@count,@price)

The problem is, that this DELETE statement takes average 5 seconds. It's much too long.

The second solution I found is to use MERGE. I've created such Stored Procedure, wchich takes Table-valued parameter:

CREATE PROCEDURE [dbo].[sp_updateTransactions]
@Table dbo.tp_Transactions readonly,
@itemId bigint,
@platform char(1)
AS
BEGIN
MERGE Transactions AS TARGET
USING @Table AS SOURCE  
ON (    
TARGET.[itemId] = SOURCE.[itemId] AND
TARGET.[platform] = SOURCE.[platform] AND 
TARGET.[dt] = SOURCE.[dt] AND 
TARGET.[count] = SOURCE.[count] AND
TARGET.[price] = SOURCE.[price] ) 


WHEN NOT MATCHED BY TARGET THEN 
INSERT VALUES (SOURCE.[itemId], 
                SOURCE.[dt],
                SOURCE.[count],
                SOURCE.[price],
                SOURCE.[platform])

WHEN NOT MATCHED BY SOURCE AND TARGET.[itemId] = @itemId AND TARGET.[platform] = @platform THEN 
DELETE;

END

This procedure takes around 7 seconds with table with 70k records. So with 8M it would probably take few minutes. The bottleneck is "When not matched" - when I commented this line, this procedure runs on average 0,01 second.

So the question is: how to improve perfomance of the delete statement?

Delete is needed to make sure, that table doesn't contains transaction that as been removed in application. But it real scenario it happens really rarely, ane the true need of deleting records is less than 1 on 10000 transaction updates.

My theoretical workaround is to create additional column like "transactionDeleted bit" and use UPDATE instead of DELETE, ane then make table cleanup by batch job every X minutes or hours and Execute

delete from transactions where transactionDeleted=1

It should be faster, but I would need to update all SELECT statements in other parts of application, to use only transactionDeleted=0 records and so it also may afect application performance.

Do you know any better solution?

UPDATE: Current indexes:

CREATE NONCLUSTERED INDEX [IX1] ON [dbo].[Transactions] 
(
[platform] ASC,
[ItemId] ASC
) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF,   IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 50) ON [PRIMARY]


CONSTRAINT [IX2] UNIQUE NONCLUSTERED 
(
[ItemId] DESC,
[count] ASC,
[dt] DESC,
[platform] ASC,
[price] ASC
) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
like image 516
adek Avatar asked Feb 03 '23 13:02

adek


2 Answers

OK, here is another approach also. For a similar problem (large scan WHEN NOT MATCHED BY SOURCE then DELETE) I reduced the MERGE execute time from 806ms to 6ms!

One issue with the problem above is that the "WHEN NOT MATCHED BY SOURCE" clause is scanning the whole TARGET table.

It is not that obvious but Microsoft allows the TARGET table to be filtered (by using a CTE) BEFORE doing the merge. So in my case the TARGET rows were reduced from 250K to less than 10 rows. BIG difference.

Assuming that the above problem works with the TARGET being filtered by @itemid and @platform then the MERGE code would look like this. The changes above to the indexes would help this logic too.

WITH Transactions_CTE (itemId
                        ,dt
                        ,count
                        ,price
                        ,platform
                        )
AS
-- Define the CTE query that will reduce the size of the TARGET table.  
(  
    SELECT itemId
        ,dt
        ,count
        ,price
        ,platform
    FROM Transactions  
    WHERE itemId = @itemId
      AND platform = @platform  
)  
MERGE Transactions_CTE AS TARGET
USING @Table AS SOURCE
    ON (
        TARGET.[itemId] = SOURCE.[itemId]
        AND TARGET.[platform] = SOURCE.[platform]
        AND TARGET.[dt] = SOURCE.[dt]
        AND TARGET.[count] = SOURCE.[count]
        AND TARGET.[price] = SOURCE.[price]
        )
WHEN NOT MATCHED BY TARGET  THEN
        INSERT
        VALUES (
            SOURCE.[itemId]
            ,SOURCE.[dt]
            ,SOURCE.[count]
            ,SOURCE.[price]
            ,SOURCE.[platform]
            )
WHEN NOT MATCHED BY SOURCE THEN
        DELETE;
like image 137
David Coster Avatar answered Mar 07 '23 18:03

David Coster


Using a BIT field for IsDeleted (or IsActive as many people do) is valid but it does require modifying all code plus creating a separate SQL Job to periodically come through and remove the "deleted" records. This might be the way to go but there is something less intrusive to try first.

I noticed in your set of 2 indexes that neither is CLUSTERED. Can I assume that the IDENTITY field is? You might consider making the [IX2] UNIQUE index the CLUSTERED one and changing the PK (again, I assume the IDENTITY field is a CLUSTERED PK) to be NONCLUSTERED. I would also reorder the IX2 fields to put [Platform] and [ItemID] first. Since your main operation is looking for [Platform] and [ItemID] as a set, physically ordering them this way might help. And since this index is unique, that is a good candidate for being CLUSTERED. It is certainly worth testing as this will impact all queries against the table.

Also, if changing the indexes as I have suggested helps, it still might be worth trying both ideas and hence doing the IsDeleted field as well to see if that increases performance even more.

EDIT: I forgot to mention, by making the IX2 index CLUSTERED and moving the [Platform] field to the top, you should get rid of the IX1 index.

EDIT2:

Just to be very clear, I am suggesting something like:

CREATE UNIQUE CLUSTERED  INDEX [IX2]
(
[ItemId] DESC,
[platform] ASC,
[count] ASC,
[dt] DESC,
[price] ASC
)

And to be fair, changing which index is CLUSTERED could also negatively impact queries where JOINs are done on the [id] field which is why you need to test thoroughly. In the end you need to tune the system for your most frequent and/or expensive queries and might have to accept that some queries will be slower as a result but that might be worth this operation being much faster.

like image 30
Solomon Rutzky Avatar answered Mar 07 '23 18:03

Solomon Rutzky