Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Azure - One session locking entire DB for Update and Insert

SQL Azure issue.

I've got an issue that manifests as the following exception on our (asp.net) site:

Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. The statement has been terminated.

It also results in update and insert statements never completing in SMSS. There aren't any X or IX locks present when querying: sys.dm_tran_locks and there are no transactions when querying sys.dm_tran_active_transactions or sys.dm_tran_database_transactions.

The problem is present for every table in the database but other databases on the same instance don't cause the problem. The duration of the issue can be anywhere from 2 minutes to 2 hours and doesn't happen at any specific times of day.

The database is not full.

At one point this issue didn't resolve itself but I was able to resolve the issue by querying sys.dm_exec_connections finding the longest running session, and then killing it. The odd thing is, that the connection was 15 minutes old, but the lock issue had been present for over 3 hours.

Is there anything else I can check?

EDIT

As per Paul's answer below. I'd actually tracked down the problem before he answered. I will post the steps I used to figure this out below, in case they help anyone else.

The following queries were run when a "timeout period" was present.

select *  from sys.dm_exec_requests

Request Stats

As we can see, all the WAIT requests are waiting on session 1021 which is the replication request! The TM Request indicates a DTC transaction and we don't use distributed transactions. You can also see the wait_type of SE_REPL_COMMIT_ACK which again implicates replication.

select * from  sys.dm_tran_locks

enter image description here

Again waiting on session 1021

SELECT * FROM sys.dm_db_wait_stats ORDER BY wait_time_ms desc

enter image description here

And yes, SE_REPL_CATCHUP_THROTTLE has a total wait time of 8094034 ms, that is 134.9minutes!!!

Also see the following forum for details on this issue. http://social.technet.microsoft.com/Forums/en-US/ssdsgetstarted/thread/c3003a28-8beb-4860-85b2-03cf6d0312a8

I've been given the following answer in my communication with Microsoft (we've seen this issue with 4 of our 15 databases in the EU data center):

Question: Have there been changes to these soft throttling limits in the last three weeks ie since my problems started?

Answer: No, there has not.

Question: Are there ways we can prevent or be warned we are approaching a limit?

Answer: No. The issue may not be caused by your application but can be caused by other tenants relying on the same physical hardware. In other words, your application can have very little load and still run into the problem. In other words, your own traffic may be a cause of this problem, but it can just as well be caused by other tenants relying on the same physical hardware. There's no way to know beforehand that the issue will soon occur - it can occur at any time without warning. The SQL Azure operations team does not monitor this type of error, so they won't automatically try to solve the problem for you. So if you run into it you have two opitions:

  1. Create a copy of your db and use that and hope the db is placed on another server with less load.

  2. Contact Windows Azure Support and inform the about the problem and let them do Option 1 for you

like image 749
Sam Shiles Avatar asked Apr 03 '13 13:04

Sam Shiles


People also ask

Does SQL Server lock on insert?

When performing an insert operation, the database does not lock the entire table, but locks the row. The following is the test I did, you can see that only the row is locked(X lock) when inserting data, and there is only an Intent lock(IX lock) on the table: use test.

What is the limitation of Azure SQL Database currently?

32,767 files per database, unless the instance storage size limit has been reached. Maximum size of each data file is 8 TB.


1 Answers

You might be running into the SE_REPL* issues that are currently plaguing a lot of folks using Sql Azure (my company included).

When you experience the timeouts, try checking your wait requests for wait types of:

  • SE_REPL_SLOW_SECONDARY_THROTTLE
  • SE_REPL_COMMIT_ACK

Run the following to check your wait types on current connections:

SELECT TOP 10 r.session_id, r.plan_handle,
r.sql_handle, r.request_id,
r.start_time, r.status,
r.command, r.database_id,
r.user_id, r.wait_type,
r.wait_time, r.last_wait_type,
r.wait_resource, r.total_elapsed_time,
r.cpu_time, r.transaction_isolation_level,
r.row_count
FROM sys.dm_exec_requests r

You can also check a history of sorts for this by running:

SELECT * FROM sys.dm_db_wait_stats
ORDER BY wait_time_ms desc

If you're seeing a lot of SE_REPL* wait types and these are staying set on your connections for any length of time, then basically you're screwed. Microsoft are aware of the problem, but I've had a support ticket open for a week with them now and they're still working on it apparently.

The SE_REPL* waits happen when the Sql Azure replication slaves fall behind. Basically the whole db suspends queries while replication catches up :/

So essentially the aspect that makes Sql Azure highly available is causing databases to become randomly unavailable. I'd laugh at the irony if it wasn't killing us.

Have a look at this thread for details: http://social.technet.microsoft.com/Forums/en-US/ssdsgetstarted/thread/c3003a28-8beb-4860-85b2-03cf6d0312a8

like image 76
Paul DB Avatar answered Nov 15 '22 21:11

Paul DB