SQL server, pyodbc and deadlock errors

Tags:

I have some code that writes Scrapy scraped data to a SQL server db. The data items consist of some basic hotel data (name, address, rating..) and some list of rooms with associated data(price, occupancy etc). There can be multiple celery threads and multiple servers running this code and simultaneously writing to the db different items. I am encountering deadlock errors like:

[Failure instance: Traceback: <class 'pyodbc.ProgrammingError'>: 
('42000', '[42000] [FreeTDS][SQL Server]Transaction (Process ID 62)
 was deadlocked on lock resources with another process and has been 
chosen as the deadlock victim. Rerun the transaction. (1205) (SQLParamData)')

The code that actually does the insert/update schematically looks like this:

1) Check if hotel exists in hotels table, if it does update it, else insert it new. 
   Get the hotel id either way. This is done by `curs.execute(...)`

2) Python loop over the hotel rooms scraped. For each room check if room exists 
   in the rooms table (which is foreign keyed to the hotels table). 
   If not, then insert it using the hotel id to reference the hotels table row. 
   Else update it.  These upserts are done using `curs.execute(...)`.

It is a bit more complicated than this in practice, but this illustrates that the Python code is using multiple curs.executes before and during the loop.

If instead of upserting the data in the above manner, I generate one big SQL command, which does the same thing (checks for hotel, upserts it, records the id to a temporary variable, for each room checks if exists and upserts against the hotel id var etc), then do only a single curs.execute(...) in the python code, then I no longer see deadlock errors.

However, I don't really understand why this makes a difference, and also I'm not entirely sure it is safe to run big SQL blocks with multiple SELECTS, INSERTS, UPDATES in a single pyodbc curs.execute. As I understand pyodbc is suppose to only handle single statements, however it does seem to work, and I see my tables populates with no deadlock errors.

Nevertheless, it seems impossible to get any output if I do a big command like this. I tried declaring a variable @output_string and recording various things to it (did we have to insert or update the hotel for example) before finally SELECT @output_string as outputstring, but doing a fetch after the execute in pyodbc always fails with

<class 'pyodbc.ProgrammingError'>: No results.  Previous SQL was not a query.

Experiments within the shell suggest pyodbc ignores everything after the first statement:

In [11]: curs.execute("SELECT 'HELLO'; SELECT 'BYE';")
Out[11]: <pyodbc.Cursor at 0x7fc52c044a50>

In [12]: curs.fetchall()
Out[12]: [('HELLO', )]

So if the first statement is not a query you get that error:

In [13]: curs.execute("PRINT 'HELLO'; SELECT 'BYE';")
Out[13]: <pyodbc.Cursor at 0x7fc52c044a50>

In [14]: curs.fetchall()
---------------------------------------------------------------------------
ProgrammingError                          Traceback (most recent call last)
<ipython-input-14-ad813e4432e9> in <module>()
----> 1 curs.fetchall()

ProgrammingError: No results.  Previous SQL was not a query.

Nevertheless, except for the inability to fetch my @output_string, my real "big query", consisting of multiple selects, updates, inserts actually works and populates multiple tables in the db.

Nevertheless, if I try something like

curs.execute('INSERT INTO testX (entid, thecol) VALUES (4, 5); INSERT INTO testX (entid, thecol) VALUES (5, 6); SELECT * FROM testX; '
...: )

I see that both rows were inserted into the table tableX, even a subsequent curs.fetchall() fails with the "Previous SQL was not a query." error, so it seems that pyodbc execute does execute everything...not just the first statement.

If I can trust this, then my main problem is how to get some output for logging.

EDIT Setting autocommit=True in the dbargs seems to prevent the deadlock errors, even with the multiple curs.executes. But why does this fix it?

400

asked Jan 27 '17 18:01

fpghost

1 Answers

Setting autocommit=True in the dbargs seems to prevent the deadlock errors, even with the multiple curs.executes. But why does this fix it?

When establishing a connection, pyodbc defaults to autocommit=False in accordance with the Python DB-API spec. Therefore when the first SQL statement is executed, ODBC begins a database transaction that remains in effect until the Python code does a .commit() or a .rollback() on the connection.

The default transaction isolation level in SQL Server is "Read Committed". Unless the database is configured to support SNAPSHOT isolation by default, a write operation within a transaction under Read Committed isolation will place transaction-scoped locks on the rows that were updated. Under conditions of high concurrency, deadlocks can occur if multiple processes generate conflicting locks. If those processes use long-lived transactions that generate a large number of such locks then the chances of a deadlock are greater.

Setting autocommit=True will avoid the deadlocks because each individual SQL statement will be automatically committed, thus ending the transaction (which was automatically started when that statement began executing) and releasing any locks on the updated rows.

So, to help avoid deadlocks you can consider a couple of different strategies:

continue to use autocommit=True, or
have your Python code explicitly .commit() more often, or
use SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED to "loosen up" the transaction isolation level and avoid the persistent locks created by write operations, or
configure the database to use SNAPSHOT isolation which will avoid lock contention but will make SQL Server work harder.

You will need to do some homework to determine the best strategy for your particular usage case.

answered Sep 22 '22 08:09

Gord Thompson

Related questions
                            
                                Strange Error XML parsing: line 1, character 2038, illegal xml character
                            
                                Powershell - User Mapping SQL Server 2012
                            
                                Sum of child levels total in a Hierarchy
                            
                                Spreading Load on SQL Server database
                            
                                Importing CSV to SQL Server with text column > 8000 characters
                            
                                Unable to change my custom type in sql server
                            
                                Subtract Business Days from a Specific date
                            
                                SQL Server Agent not starting automatically on server reboot [closed]
                            
                                SQL statement to print table names and their column names
                            
                                Security risks of setting trustworthy = on in sql server 2012
                            
                                Where are the pre- and post-deployment scripts in VS2013 Database Projects?
                            
                                SSIS Package Store vs. MSDB
                            
                                Comparing two SQL Server database schema in C#
                            
                                SQL Server Update query very slow
                            
                                How to post messages to RabbitMQ from SQL Server?
                            
                                Can Visual Studio 2015 database project publish to both SQL Server _and_ Azure?
                            
                                SSRS 2014 report deploy issue
                            
                                How to insert custom values with INSERT INTO + SELECT FROM?
                            
                                Unable to install/run SQL Server driver in WildFly 10
                            
                                Connect to SQL Server Express Database with Python (Windows Authentication)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SQL server, pyodbc and deadlock errors

Tags:

sql-server

parallel-processing

scrapy

celery

pyodbc

fpghost

People also ask

1 Answers

Gord Thompson

Recent Activity

Donate For Us