Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rare and elusive deadlocks (select for update; then update) in case of multiple concurrent transactions

Database: MSSQL server 2012;

Isolation level: READ_COMMITTED_SNAPSHOT

Now I have a table "COV_HOLES_PERIODDATE". It has a composite primary key which is also a clustered index. No other indexes on this table.

There are many threads(via Java) working concurrently. Each thread will first do a "select for update" on a DIFFERENT primary key via lock hint (updlock, rowlock), then do some work, then update table for this row. It is guaranteed from Java side that each thread is operating on a different row in this table.

However there is a rare and elusive deadlock that I could not reproduce consistently. This deadlock happens once in a great while. I would imagine there should never be lock contention since each transaction should only lock different row. The deadlock graph shows "collision" on "key lock". "Hobt id" in the deadlock graph, as I queried, is the "index" in this table.

What did I miss? Thanks!

Code listing for each tran: note each tran operate on a different primary key.

begin tran;

select DIVISION_ID 
from COV_HOLES_PERIODDATE with (updlock, rowlock) 
where composite_primary_key = xxx;

//some work on other tables, not related to this table.

update COV_HOLES_PERIODDATE 
set non_primaryKey_field= xxx
where composite_primary_key = xxx;

commit;

EDIT 3: from the execution plan captured by SQL server profiler while my Java load testing program runs, if I read it correctly, due to implicit conversion, "skill=" was changed to "skill between a range" in "Seek Predicates" as this image shows. But I do not know how to interpret that it is still an "Clustered Index Seek", although "Seek Predicates" in this image shows a range for "Skill" column?

http://i.stack.imgur.com/9vThc.png

Code listing for the intended sql (but execution plan shows implicit conversion in "skill" column changes "skill" part) select DIVISION_ID from cpq_jfu_v4_speedTest.cpqjfuv4speedTest.COV_HOLES_PERIODDATE with (updlock, rowlock) where DIVISION_ID = 1 and UNIT_ID = 2 and SKILL = 'M'
and PERIOD_START_DATE = '2014-02-09' ;

EDIT 2: first link is the picture for the deadlock. 2nd link is the table and its index http://i.stack.imgur.com/8j6V9.png http://i.stack.imgur.com/JaFkl.png

EDIT 1: listing for deadlock graph

  <deadlock-list>
<deadlock victim="process2fb02d498">
        <process-list>
            <process id="process2fb02d498" taskpriority="0" logused="0" waitresource="KEY: 6:72057594048413696 (425c06b927a8)" waittime="4321" ownerId="181788" transactionname="implicit_transaction" lasttranstarted="2014-02-18T14:29:08.930" XDES="0x2f0d6a3a8" lockMode="U" schedulerid="4" kpid="8704" status="suspended" spid="60" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2014-02-18T14:29:09.010" lastbatchcompleted="2014-02-18T14:29:09.010" lastattention="1900-01-01T00:00:00.010" clientapp="Microsoft JDBC Driver for SQL Server" hostname="ACNU34794GD" hostpid="0" loginname="cpq_jfu_v380_speedTest_WEBAPP" isolationlevel="read committed (2)" xactid="181788" currentdb="6" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128058">
                <executionStack>
                    <frame procname="adhoc" line="1" stmtstart="148" sqlhandle="0x02000000f9bd1c3532128f75b67ae477b2447ba2a64528db0000000000000000000000000000000000000000">
update cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE set LAST_UPDATED_DTZ=@P0 where DIVISION_ID=@P1 and UNIT_ID=@P2 and SKILL=@P3 and PERIOD_START_DATE=@P4     </frame>
                    <frame procname="unknown" line="1" sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">
unknown     </frame>
                </executionStack>
                <inputbuf>
(@P0 nvarchar(4000),@P1 smallint,@P2 smallint,@P3 nvarchar(4000),@P4 date)update cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE set LAST_UPDATED_DTZ=@P0 where DIVISION_ID=@P1 and UNIT_ID=@P2 and SKILL=@P3 and PERIOD_START_DATE=@P4                                            </inputbuf>
            </process>
            <process id="process2f0110cf8" taskpriority="0" logused="0" waitresource="KEY: 6:72057594048413696 (3d51ccbaf870)" waittime="4303" ownerId="181789" transactionname="implicit_transaction" lasttranstarted="2014-02-18T14:29:08.933" XDES="0x2f651e6c8" lockMode="U" schedulerid="3" kpid="904" status="suspended" spid="56" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2014-02-18T14:29:09.030" lastbatchcompleted="2014-02-18T14:29:09.030" lastattention="1900-01-01T00:00:00.030" clientapp="Microsoft JDBC Driver for SQL Server" hostname="ACNU34794GD" hostpid="0" loginname="cpq_jfu_v380_speedTest_WEBAPP" isolationlevel="read committed (2)" xactid="181789" currentdb="6" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128058">
                <executionStack>
                    <frame procname="adhoc" line="1" stmtstart="148" sqlhandle="0x02000000f9bd1c3532128f75b67ae477b2447ba2a64528db0000000000000000000000000000000000000000">
update cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE set LAST_UPDATED_DTZ=@P0 where DIVISION_ID=@P1 and UNIT_ID=@P2 and SKILL=@P3 and PERIOD_START_DATE=@P4     </frame>
                    <frame procname="unknown" line="1" sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">
unknown     </frame>
                </executionStack>
                <inputbuf>
(@P0 nvarchar(4000),@P1 smallint,@P2 smallint,@P3 nvarchar(4000),@P4 date)update cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE set LAST_UPDATED_DTZ=@P0 where DIVISION_ID=@P1 and UNIT_ID=@P2 and SKILL=@P3 and PERIOD_START_DATE=@P4                                            </inputbuf>
            </process>
            <process id="process2f05d0188" taskpriority="0" logused="0" waitresource="KEY: 6:72057594048413696 (f40d3be1c4a7)" waittime="4381" ownerId="181790" transactionname="implicit_transaction" lasttranstarted="2014-02-18T14:29:08.937" XDES="0x2f0290d08" lockMode="U" schedulerid="4" kpid="5248" status="suspended" spid="59" sbid="0" ecid="0" priority="0" trancount="1" lastbatchstarted="2014-02-18T14:29:08.950" lastbatchcompleted="2014-02-18T14:29:08.950" lastattention="1900-01-01T00:00:00.950" clientapp="Microsoft JDBC Driver for SQL Server" hostname="ACNU34794GD" hostpid="0" loginname="cpq_jfu_v380_speedTest_WEBAPP" isolationlevel="read committed (2)" xactid="181790" currentdb="6" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128058">
                <executionStack>
                    <frame procname="adhoc" line="1" stmtstart="110" sqlhandle="0x02000000e59aa71c628d78d0574a5f60f697c8ffd8228e4b0000000000000000000000000000000000000000">
select DIVISION_ID from cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE with (updlock, rowlock) where DIVISION_ID =@P0 and UNIT_ID =@P1 and SKILL =@P2 and PERIOD_START_DATE =@P3     </frame>
                    <frame procname="unknown" line="1" sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">
unknown     </frame>
                </executionStack>
                <inputbuf>
(@P0 smallint,@P1 smallint,@P2 nvarchar(4000),@P3 date)select DIVISION_ID from cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE with (updlock, rowlock) where DIVISION_ID =@P0 and UNIT_ID =@P1 and SKILL =@P2 and PERIOD_START_DATE =@P3                                    </inputbuf>
            </process>
            <process id="process2f2a9f868" taskpriority="0" logused="0" waitresource="KEY: 6:72057594048413696 (8b00f1e21b7f)" waittime="4380" ownerId="181792" transactionname="implicit_transaction" lasttranstarted="2014-02-18T14:29:08.950" XDES="0x2f651f1d8" lockMode="U" schedulerid="2" kpid="7652" status="suspended" spid="63" sbid="0" ecid="0" priority="0" trancount="1" lastbatchstarted="2014-02-18T14:29:08.950" lastbatchcompleted="2014-02-18T14:29:08.950" lastattention="1900-01-01T00:00:00.950" clientapp="Microsoft JDBC Driver for SQL Server" hostname="ACNU34794GD" hostpid="0" loginname="cpq_jfu_v380_speedTest_WEBAPP" isolationlevel="read committed (2)" xactid="181792" currentdb="6" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128058">
                <executionStack>
                    <frame procname="adhoc" line="1" stmtstart="110" sqlhandle="0x02000000e59aa71c628d78d0574a5f60f697c8ffd8228e4b0000000000000000000000000000000000000000">
select DIVISION_ID from cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE with (updlock, rowlock) where DIVISION_ID =@P0 and UNIT_ID =@P1 and SKILL =@P2 and PERIOD_START_DATE =@P3     </frame>
                    <frame procname="unknown" line="1" sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">
unknown     </frame>
                </executionStack>
                <inputbuf>
(@P0 smallint,@P1 smallint,@P2 nvarchar(4000),@P3 date)select DIVISION_ID from cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE with (updlock, rowlock) where DIVISION_ID =@P0 and UNIT_ID =@P1 and SKILL =@P2 and PERIOD_START_DATE =@P3                                    </inputbuf>
            </process>
        </process-list>
        <resource-list>
            <keylock hobtid="72057594048413696" dbid="6" objectname="cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE" indexname="1" id="lock2c70ba180" mode="U" associatedObjectId="72057594048413696">
                <owner-list>
                    <owner id="process2f2a9f868" mode="U"/>
                </owner-list>
                <waiter-list>
                    <waiter id="process2fb02d498" mode="U" requestType="wait"/>
                </waiter-list>
            </keylock>
            <keylock hobtid="72057594048413696" dbid="6" objectname="cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE" indexname="1" id="lock2cc3d9980" mode="U" associatedObjectId="72057594048413696">
                <owner-list>
                    <owner id="process2f05d0188" mode="U"/>
                </owner-list>
                <waiter-list>
                    <waiter id="process2f0110cf8" mode="U" requestType="wait"/>
                </waiter-list>
            </keylock>
            <keylock hobtid="72057594048413696" dbid="6" objectname="cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE" indexname="1" id="lock2cf4fa480" mode="U" associatedObjectId="72057594048413696">
                <owner-list>
                    <owner id="process2f0110cf8" mode="U"/>
                </owner-list>
                <waiter-list>
                    <waiter id="process2f05d0188" mode="U" requestType="wait"/>
                </waiter-list>
            </keylock>
            <keylock hobtid="72057594048413696" dbid="6" objectname="cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE" indexname="1" id="lock2cf4f9e80" mode="U" associatedObjectId="72057594048413696">
                <owner-list>
                    <owner id="process2fb02d498" mode="U"/>
                </owner-list>
                <waiter-list>
                    <waiter id="process2f2a9f868" mode="U" requestType="wait"/>
                </waiter-list>
            </keylock>
        </resource-list>
    </deadlock>
    <deadlock victim="process2f0110cf8">
        <process-list>
            <process id="process2fb02d498" taskpriority="0" logused="0" waitresource="KEY: 6:72057594048413696 (425c06b927a8)" waittime="4321" ownerId="181788" transactionname="implicit_transaction" lasttranstarted="2014-02-18T14:29:08.930" XDES="0x2f0d6a3a8" lockMode="U" schedulerid="4" kpid="8704" status="suspended" spid="60" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2014-02-18T14:29:09.010" lastbatchcompleted="2014-02-18T14:29:09.010" lastattention="1900-01-01T00:00:00.010" clientapp="Microsoft JDBC Driver for SQL Server" hostname="ACNU34794GD" hostpid="0" loginname="cpq_jfu_v380_speedTest_WEBAPP" isolationlevel="read committed (2)" xactid="181788" currentdb="6" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128058">
                <executionStack>
                    <frame procname="adhoc" line="1" stmtstart="148" sqlhandle="0x02000000f9bd1c3532128f75b67ae477b2447ba2a64528db0000000000000000000000000000000000000000">
update cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE set LAST_UPDATED_DTZ=@P0 where DIVISION_ID=@P1 and UNIT_ID=@P2 and SKILL=@P3 and PERIOD_START_DATE=@P4     </frame>
                    <frame procname="unknown" line="1" sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">
unknown     </frame>
                </executionStack>
                <inputbuf>
(@P0 nvarchar(4000),@P1 smallint,@P2 smallint,@P3 nvarchar(4000),@P4 date)update cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE set LAST_UPDATED_DTZ=@P0 where DIVISION_ID=@P1 and UNIT_ID=@P2 and SKILL=@P3 and PERIOD_START_DATE=@P4                                            </inputbuf>
            </process>
            <process id="process2f0110cf8" taskpriority="0" logused="0" waitresource="KEY: 6:72057594048413696 (3d51ccbaf870)" waittime="4303" ownerId="181789" transactionname="implicit_transaction" lasttranstarted="2014-02-18T14:29:08.933" XDES="0x2f651e6c8" lockMode="U" schedulerid="3" kpid="904" status="suspended" spid="56" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2014-02-18T14:29:09.030" lastbatchcompleted="2014-02-18T14:29:09.030" lastattention="1900-01-01T00:00:00.030" clientapp="Microsoft JDBC Driver for SQL Server" hostname="ACNU34794GD" hostpid="0" loginname="cpq_jfu_v380_speedTest_WEBAPP" isolationlevel="read committed (2)" xactid="181789" currentdb="6" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128058">
                <executionStack>
                    <frame procname="adhoc" line="1" stmtstart="148" sqlhandle="0x02000000f9bd1c3532128f75b67ae477b2447ba2a64528db0000000000000000000000000000000000000000">
update cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE set LAST_UPDATED_DTZ=@P0 where DIVISION_ID=@P1 and UNIT_ID=@P2 and SKILL=@P3 and PERIOD_START_DATE=@P4     </frame>
                    <frame procname="unknown" line="1" sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">
unknown     </frame>
                </executionStack>
                <inputbuf>
(@P0 nvarchar(4000),@P1 smallint,@P2 smallint,@P3 nvarchar(4000),@P4 date)update cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE set LAST_UPDATED_DTZ=@P0 where DIVISION_ID=@P1 and UNIT_ID=@P2 and SKILL=@P3 and PERIOD_START_DATE=@P4                                            </inputbuf>
            </process>
            <process id="process2f05d0188" taskpriority="0" logused="0" waitresource="KEY: 6:72057594048413696 (f40d3be1c4a7)" waittime="4381" ownerId="181790" transactionname="implicit_transaction" lasttranstarted="2014-02-18T14:29:08.937" XDES="0x2f0290d08" lockMode="U" schedulerid="4" kpid="5248" status="suspended" spid="59" sbid="0" ecid="0" priority="0" trancount="1" lastbatchstarted="2014-02-18T14:29:08.950" lastbatchcompleted="2014-02-18T14:29:08.950" lastattention="1900-01-01T00:00:00.950" clientapp="Microsoft JDBC Driver for SQL Server" hostname="ACNU34794GD" hostpid="0" loginname="cpq_jfu_v380_speedTest_WEBAPP" isolationlevel="read committed (2)" xactid="181790" currentdb="6" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128058">
                <executionStack>
                    <frame procname="adhoc" line="1" stmtstart="110" sqlhandle="0x02000000e59aa71c628d78d0574a5f60f697c8ffd8228e4b0000000000000000000000000000000000000000">
select DIVISION_ID from cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE with (updlock, rowlock) where DIVISION_ID =@P0 and UNIT_ID =@P1 and SKILL =@P2 and PERIOD_START_DATE =@P3     </frame>
                    <frame procname="unknown" line="1" sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">
unknown     </frame>
                </executionStack>
                <inputbuf>
(@P0 smallint,@P1 smallint,@P2 nvarchar(4000),@P3 date)select DIVISION_ID from cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE with (updlock, rowlock) where DIVISION_ID =@P0 and UNIT_ID =@P1 and SKILL =@P2 and PERIOD_START_DATE =@P3                                    </inputbuf>
            </process>
            <process id="process2f2a9f868" taskpriority="0" logused="0" waitresource="KEY: 6:72057594048413696 (8b00f1e21b7f)" waittime="4381" ownerId="181792" transactionname="implicit_transaction" lasttranstarted="2014-02-18T14:29:08.950" XDES="0x2f651f1d8" lockMode="U" schedulerid="2" kpid="7652" status="suspended" spid="63" sbid="0" ecid="0" priority="0" trancount="1" lastbatchstarted="2014-02-18T14:29:08.950" lastbatchcompleted="2014-02-18T14:29:08.950" lastattention="1900-01-01T00:00:00.950" clientapp="Microsoft JDBC Driver for SQL Server" hostname="ACNU34794GD" hostpid="0" loginname="cpq_jfu_v380_speedTest_WEBAPP" isolationlevel="read committed (2)" xactid="181792" currentdb="6" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128058">
                <executionStack>
                    <frame procname="adhoc" line="1" stmtstart="110" sqlhandle="0x02000000e59aa71c628d78d0574a5f60f697c8ffd8228e4b0000000000000000000000000000000000000000">
select DIVISION_ID from cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE with (updlock, rowlock) where DIVISION_ID =@P0 and UNIT_ID =@P1 and SKILL =@P2 and PERIOD_START_DATE =@P3     </frame>
                    <frame procname="unknown" line="1" sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">
unknown     </frame>
                </executionStack>
                <inputbuf>
(@P0 smallint,@P1 smallint,@P2 nvarchar(4000),@P3 date)select DIVISION_ID from cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE with (updlock, rowlock) where DIVISION_ID =@P0 and UNIT_ID =@P1 and SKILL =@P2 and PERIOD_START_DATE =@P3                                    </inputbuf>
            </process>
        </process-list>
        <resource-list>
            <keylock hobtid="72057594048413696" dbid="6" objectname="cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE" indexname="1" id="lock2c70ba180" mode="U" associatedObjectId="72057594048413696">
                <owner-list>
                    <owner id="process2f2a9f868" mode="U"/>
                </owner-list>
                <waiter-list>
                    <waiter id="process2fb02d498" mode="U" requestType="wait"/>
                </waiter-list>
            </keylock>
            <keylock hobtid="72057594048413696" dbid="6" objectname="cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE" indexname="1" id="lock2cc3d9980" mode="U" associatedObjectId="72057594048413696">
                <owner-list>
                    <owner id="process2f05d0188" mode="U"/>
                </owner-list>
                <waiter-list>
                    <waiter id="process2f0110cf8" mode="U" requestType="wait"/>
                </waiter-list>
            </keylock>
            <keylock hobtid="72057594048413696" dbid="6" objectname="cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE" indexname="1" id="lock2cf4fa480" mode="U" associatedObjectId="72057594048413696">
                <owner-list>
                    <owner id="process2f0110cf8" mode="U"/>
                </owner-list>
                <waiter-list>
                    <waiter id="process2f05d0188" mode="U" requestType="wait"/>
                </waiter-list>
            </keylock>
            <keylock hobtid="72057594048413696" dbid="6" objectname="cpq_jfu_v380_speedTest.cpqjfuv380speedTest.COV_HOLES_PERIODDATE" indexname="1" id="lock2cf4f9e80" mode="U" associatedObjectId="72057594048413696">
                <owner-list>
                    <owner id="process2fb02d498" mode="U"/>
                </owner-list>
                <waiter-list>
                    <waiter id="process2f2a9f868" mode="U" requestType="wait"/>
                </waiter-list>
            </keylock>
        </resource-list>
    </deadlock>
</deadlock-list>
like image 860
riceball Avatar asked Mar 18 '14 19:03

riceball


1 Answers

The investigation documented in the comments lead to the following answer:

You're passing in parameters which do not match the corresponding columns in type. This leads to implicit conversions and a range seek on the clustered index. "Seek" mean that a subset of the index will be scanned. It does not necessarily imply a singleton seek.

The range seek on the 3rd column of the index means that the query completely disregards the filter on the 4th column from a locking perspective. This can lead to multiple rows being locked. Not necessarily range locks because these are not taken under READ COMMITTED. Had you also set the locking hint HOLDLOCK (or SERIALIZABLE) range locks would have been taken.


Other things that were tried:

Is this the whole transaction? Triggers? Indexed views (shouldn't matter with the given deadlock graph but who knows)? Can you post the graph as XML somewhere? And there are no NC indexes, right? Can you post the query plan for both statements?

.

Why do some of the transactions have trancount="2". Maybe you misnested transactions so that they are extended? That would explain why multiple locks can be taken per tran. IOW an app bug. Add this IF @@TRANCOUNT <> 1 ROLLBACK to assert this at runtime. Rerun your load test with this assert.

.

Are there implicit conversions? I know there are because you're passing in a nvarchar(4000) but PK columns cannot have such a big column. This can lead to range locks being taken. Change the update from @P2 to CONVERT(CORRECT_COLUMN_TYPE_HERE, @P2) to make sure. Examine the execution plan to make sure that the CI seek is using only equality predicates. Not ranges.

I'm adding this to the answer so that future visitors of this question can learn how to investigate an issue like this one.

like image 183
usr Avatar answered Nov 15 '22 08:11

usr