Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL server, pyodbc and deadlock errors

I have some code that writes Scrapy scraped data to a SQL server db. The data items consist of some basic hotel data (name, address, rating..) and some list of rooms with associated data(price, occupancy etc). There can be multiple celery threads and multiple servers running this code and simultaneously writing to the db different items. I am encountering deadlock errors like:

[Failure instance: Traceback: <class 'pyodbc.ProgrammingError'>: 
('42000', '[42000] [FreeTDS][SQL Server]Transaction (Process ID 62)
 was deadlocked on lock resources with another process and has been 
chosen as the deadlock victim. Rerun the transaction. (1205) (SQLParamData)')

The code that actually does the insert/update schematically looks like this:

1) Check if hotel exists in hotels table, if it does update it, else insert it new. 
   Get the hotel id either way. This is done by `curs.execute(...)`

2) Python loop over the hotel rooms scraped. For each room check if room exists 
   in the rooms table (which is foreign keyed to the hotels table). 
   If not, then insert it using the hotel id to reference the hotels table row. 
   Else update it.  These upserts are done using `curs.execute(...)`.

It is a bit more complicated than this in practice, but this illustrates that the Python code is using multiple curs.executes before and during the loop.

If instead of upserting the data in the above manner, I generate one big SQL command, which does the same thing (checks for hotel, upserts it, records the id to a temporary variable, for each room checks if exists and upserts against the hotel id var etc), then do only a single curs.execute(...) in the python code, then I no longer see deadlock errors.

However, I don't really understand why this makes a difference, and also I'm not entirely sure it is safe to run big SQL blocks with multiple SELECTS, INSERTS, UPDATES in a single pyodbc curs.execute. As I understand pyodbc is suppose to only handle single statements, however it does seem to work, and I see my tables populates with no deadlock errors.

Nevertheless, it seems impossible to get any output if I do a big command like this. I tried declaring a variable @output_string and recording various things to it (did we have to insert or update the hotel for example) before finally SELECT @output_string as outputstring, but doing a fetch after the execute in pyodbc always fails with

<class 'pyodbc.ProgrammingError'>: No results.  Previous SQL was not a query.

Experiments within the shell suggest pyodbc ignores everything after the first statement:

In [11]: curs.execute("SELECT 'HELLO'; SELECT 'BYE';")
Out[11]: <pyodbc.Cursor at 0x7fc52c044a50>

In [12]: curs.fetchall()
Out[12]: [('HELLO', )]

So if the first statement is not a query you get that error:

In [13]: curs.execute("PRINT 'HELLO'; SELECT 'BYE';")
Out[13]: <pyodbc.Cursor at 0x7fc52c044a50>

In [14]: curs.fetchall()
---------------------------------------------------------------------------
ProgrammingError                          Traceback (most recent call last)
<ipython-input-14-ad813e4432e9> in <module>()
----> 1 curs.fetchall()

ProgrammingError: No results.  Previous SQL was not a query.

Nevertheless, except for the inability to fetch my @output_string, my real "big query", consisting of multiple selects, updates, inserts actually works and populates multiple tables in the db.

Nevertheless, if I try something like

curs.execute('INSERT INTO testX (entid, thecol) VALUES (4, 5); INSERT INTO testX (entid, thecol) VALUES (5, 6); SELECT * FROM testX; '
...: )

I see that both rows were inserted into the table tableX, even a subsequent curs.fetchall() fails with the "Previous SQL was not a query." error, so it seems that pyodbc execute does execute everything...not just the first statement.

If I can trust this, then my main problem is how to get some output for logging.

EDIT Setting autocommit=True in the dbargs seems to prevent the deadlock errors, even with the multiple curs.executes. But why does this fix it?

like image 400
fpghost Avatar asked Jan 27 '17 18:01

fpghost


People also ask

How can stop deadlock in SQL Server?

Useful ways to avoid and minimize SQL Server deadlocksTry to keep transactions short; this will avoid holding locks in a transaction for a long period of time. Access objects in a similar logical manner in multiple transactions. Create a covering index to reduce the possibility of a deadlock.

How do you check if there is any deadlock in SQL Server database?

Use SQL Server Profiler to identify the cause of a deadlock. A deadlock occurs when there is a cyclic dependency between two or more threads, or processes, for some set of resources within SQL Server. Using SQL Server Profiler, you can create a trace that records, replays, and displays deadlock events for analysis.

What causes deadlock in SQL Server?

A deadlock problem occurs when two (or more than two) operations already want to access resources locked by the other one. In this circumstance, database resources are affected negatively because both processes are constantly waiting for each other. This contention issue is terminated by the SQL Server intervention.

How can we avoid deadlock while updating SQL Server?

Update lock (U) is used to avoid deadlocks. Unlike the Exclusive lock, the Update lock places a Shared lock on a resource that already has another shared lock on it.

Can I use pyodbc with SQL Server?

pyODBC uses the Microsoft ODBC driver for SQL Server. If your version of the ODBC driver is 17.1 or later, you can use the AAD interactive mode of the ODBC driver through pyODBC. This AAD interactive option works if Python and pyODBC allow the ODBC driver to pop up the dialog.

How to detect a deadlock in SQL Server?

It detects a deadlock automatically and raises an alert, emailed to your inbox. The top half of the Details tab, on the alert screen in SQL Monitor, presents in a digestible form the sessions, queries and database objects involved in the deadlock. Figure 4 – a deadlocked process in SQL Monitor, Redgate's SQL Server performance monitoring tool

How do I use pyodbc with Azure Active Directory?

Azure Active Directory and the connection string pyODBC uses the Microsoft ODBC driver for SQL Server. If your version of the ODBC driver is 17.1 or later, you can use the Azure Active Directory interactive mode of the ODBC driver through pyODBC. This interactive option works if Python and pyODBC permit the ODBC driver to display the dialog.

What happens when there are a lot of deadlocks?

If there are a lot of deadlocks SQL Server automatically adjusts the frequency of the deadlock search, and back up to 5 seconds if deadlocks are no longer as frequent. How does SQL Server choose the victim? There are a couple of factors that come into play here. The first is the deadlock priority.


1 Answers

Setting autocommit=True in the dbargs seems to prevent the deadlock errors, even with the multiple curs.executes. But why does this fix it?

When establishing a connection, pyodbc defaults to autocommit=False in accordance with the Python DB-API spec. Therefore when the first SQL statement is executed, ODBC begins a database transaction that remains in effect until the Python code does a .commit() or a .rollback() on the connection.

The default transaction isolation level in SQL Server is "Read Committed". Unless the database is configured to support SNAPSHOT isolation by default, a write operation within a transaction under Read Committed isolation will place transaction-scoped locks on the rows that were updated. Under conditions of high concurrency, deadlocks can occur if multiple processes generate conflicting locks. If those processes use long-lived transactions that generate a large number of such locks then the chances of a deadlock are greater.

Setting autocommit=True will avoid the deadlocks because each individual SQL statement will be automatically committed, thus ending the transaction (which was automatically started when that statement began executing) and releasing any locks on the updated rows.

So, to help avoid deadlocks you can consider a couple of different strategies:

  • continue to use autocommit=True, or
  • have your Python code explicitly .commit() more often, or
  • use SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED to "loosen up" the transaction isolation level and avoid the persistent locks created by write operations, or
  • configure the database to use SNAPSHOT isolation which will avoid lock contention but will make SQL Server work harder.

You will need to do some homework to determine the best strategy for your particular usage case.

like image 66
Gord Thompson Avatar answered Sep 22 '22 08:09

Gord Thompson