SQL Azure issue. I've got an issue that manifests as the following exception on our (asp.net) site: <blockquote> Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. The statement has been terminated. </blockquote> It also results in update and insert statements never completing in SMSS. There aren't any X or IX locks present when querying: <code>sys.dm_tran_locks</code> and there are no transactions when querying <code>sys.dm_tran_active_transactions</code> or <code>sys.dm_tran_database_transactions</code>. The problem is present for every table in the database but other databases on the same instance don't cause the problem. The duration of the issue can be anywhere from 2 minutes to 2 hours and doesn't happen at any specific times of day. The database is not full. At one point this issue didn't resolve itself but I was able to resolve the issue by querying <code>sys.dm_exec_connections</code> finding the longest running session, and then killing it. The odd thing is, that the connection was 15 minutes old, but the lock issue had been present for over 3 hours. Is there anything else I can check? EDIT As per Paul's answer below. I'd actually tracked down the problem before he answered. I will post the steps I used to figure this out below, in case they help anyone else. The following queries were run when a "timeout period" was present. <pre class="prettyprint"><code>select * from sys.dm_exec_requests </code></pre> <img src="https://i.stack.imgur.com/wzWlM.png" alt="Request Stats"> As we can see, all the WAIT requests are waiting on session 1021 which is the replication request! The <code>TM Request</code> indicates a DTC transaction and we don't use distributed transactions. You can also see the wait_type of <code>SE_REPL_COMMIT_ACK</code> which again implicates replication. <pre class="prettyprint"><code>select * from sys.dm_tran_locks </code></pre> <img src="https://i.stack.imgur.com/AyBo2.png" alt="enter image description here"> Again waiting on session 1021 <pre class="prettyprint"><code>SELECT * FROM sys.dm_db_wait_stats ORDER BY wait_time_ms desc </code></pre> <img src="https://i.stack.imgur.com/QZnKl.png" alt="enter image description here"> And yes, <code>SE_REPL_CATCHUP_THROTTLE</code> has a total wait time of 8094034 ms, that is 134.9minutes!!! Also see the following forum for details on this issue. http://social.technet.microsoft.com/Forums/en-US/ssdsgetstarted/thread/c3003a28-8beb-4860-85b2-03cf6d0312a8 <blockquote> I've been given the following answer in my communication with Microsoft (we've seen this issue with 4 of our 15 databases in the EU data center): Question: Have there been changes to these soft throttling limits in the last three weeks ie since my problems started? Answer: No, there has not. Question: Are there ways we can prevent or be warned we are approaching a limit? Answer: No. The issue may not be caused by your application but can be caused by other tenants relying on the same physical hardware. In other words, your application can have very little load and still run into the problem. In other words, your own traffic may be a cause of this problem, but it can just as well be caused by other tenants relying on the same physical hardware. There's no way to know beforehand that the issue will soon occur - it can occur at any time without warning. The SQL Azure operations team does not monitor this type of error, so they won't automatically try to solve the problem for you. So if you run into it you have two opitions: <ol> <li> Create a copy of your db and use that and hope the db is placed on another server with less load. </li> <li> Contact Windows Azure Support and inform the about the problem and let them do Option 1 for you </li> </ol> </blockquote>

You might be running into the SE_REPL* issues that are currently plaguing a lot of folks using Sql Azure (my company included). When you experience the timeouts, try checking your wait requests for wait types of: <ul> <li>SE_REPL_SLOW_SECONDARY_THROTTLE</li> <li>SE_REPL_COMMIT_ACK</li> </ul> Run the following to check your wait types on current connections: <pre class="prettyprint"><code>SELECT TOP 10 r.session_id, r.plan_handle, r.sql_handle, r.request_id, r.start_time, r.status, r.command, r.database_id, r.user_id, r.wait_type, r.wait_time, r.last_wait_type, r.wait_resource, r.total_elapsed_time, r.cpu_time, r.transaction_isolation_level, r.row_count FROM sys.dm_exec_requests r </code></pre> You can also check a history of sorts for this by running: <pre class="prettyprint"><code>SELECT * FROM sys.dm_db_wait_stats ORDER BY wait_time_ms desc </code></pre> If you're seeing a lot of SE_REPL* wait types and these are staying set on your connections for any length of time, then basically you're screwed. Microsoft are aware of the problem, but I've had a support ticket open for a week with them now and they're still working on it apparently. The SE_REPL* waits happen when the Sql Azure replication slaves fall behind. Basically the whole db suspends queries while replication catches up :/ So essentially the aspect that makes Sql Azure highly available is causing databases to become randomly unavailable. I'd laugh at the irony if it wasn't killing us. Have a look at this thread for details: http://social.technet.microsoft.com/Forums/en-US/ssdsgetstarted/thread/c3003a28-8beb-4860-85b2-03cf6d0312a8

SQL Azure - One session locking entire DB for Update and Insert

Tags:

sql

sql-server

azure-sql-database

SQL Azure issue.

I've got an issue that manifests as the following exception on our (asp.net) site:

Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. The statement has been terminated.

It also results in update and insert statements never completing in SMSS. There aren't any X or IX locks present when querying: sys.dm_tran_locks and there are no transactions when querying sys.dm_tran_active_transactions or sys.dm_tran_database_transactions.

The problem is present for every table in the database but other databases on the same instance don't cause the problem. The duration of the issue can be anywhere from 2 minutes to 2 hours and doesn't happen at any specific times of day.

The database is not full.

At one point this issue didn't resolve itself but I was able to resolve the issue by querying sys.dm_exec_connections finding the longest running session, and then killing it. The odd thing is, that the connection was 15 minutes old, but the lock issue had been present for over 3 hours.

Is there anything else I can check?

EDIT

As per Paul's answer below. I'd actually tracked down the problem before he answered. I will post the steps I used to figure this out below, in case they help anyone else.

The following queries were run when a "timeout period" was present.

select *  from sys.dm_exec_requests

Request Stats

As we can see, all the WAIT requests are waiting on session 1021 which is the replication request! The TM Request indicates a DTC transaction and we don't use distributed transactions. You can also see the wait_type of SE_REPL_COMMIT_ACK which again implicates replication.

select * from  sys.dm_tran_locks

enter image description here

Again waiting on session 1021

SELECT * FROM sys.dm_db_wait_stats ORDER BY wait_time_ms desc

enter image description here

And yes, SE_REPL_CATCHUP_THROTTLE has a total wait time of 8094034 ms, that is 134.9minutes!!!

Also see the following forum for details on this issue. http://social.technet.microsoft.com/Forums/en-US/ssdsgetstarted/thread/c3003a28-8beb-4860-85b2-03cf6d0312a8

I've been given the following answer in my communication with Microsoft (we've seen this issue with 4 of our 15 databases in the EU data center):

Question: Have there been changes to these soft throttling limits in the last three weeks ie since my problems started?

Answer: No, there has not.

Question: Are there ways we can prevent or be warned we are approaching a limit?

Answer: No. The issue may not be caused by your application but can be caused by other tenants relying on the same physical hardware. In other words, your application can have very little load and still run into the problem. In other words, your own traffic may be a cause of this problem, but it can just as well be caused by other tenants relying on the same physical hardware. There's no way to know beforehand that the issue will soon occur - it can occur at any time without warning. The SQL Azure operations team does not monitor this type of error, so they won't automatically try to solve the problem for you. So if you run into it you have two opitions:

Create a copy of your db and use that and hope the db is placed on another server with less load.

Contact Windows Azure Support and inform the about the problem and let them do Option 1 for you

749

asked Apr 03 '13 13:04

Sam Shiles

1 Answers

You might be running into the SE_REPL* issues that are currently plaguing a lot of folks using Sql Azure (my company included).

When you experience the timeouts, try checking your wait requests for wait types of:

SE_REPL_SLOW_SECONDARY_THROTTLE
SE_REPL_COMMIT_ACK

Run the following to check your wait types on current connections:

SELECT TOP 10 r.session_id, r.plan_handle,
r.sql_handle, r.request_id,
r.start_time, r.status,
r.command, r.database_id,
r.user_id, r.wait_type,
r.wait_time, r.last_wait_type,
r.wait_resource, r.total_elapsed_time,
r.cpu_time, r.transaction_isolation_level,
r.row_count
FROM sys.dm_exec_requests r

You can also check a history of sorts for this by running:

SELECT * FROM sys.dm_db_wait_stats
ORDER BY wait_time_ms desc

If you're seeing a lot of SE_REPL* wait types and these are staying set on your connections for any length of time, then basically you're screwed. Microsoft are aware of the problem, but I've had a support ticket open for a week with them now and they're still working on it apparently.

The SE_REPL* waits happen when the Sql Azure replication slaves fall behind. Basically the whole db suspends queries while replication catches up :/

So essentially the aspect that makes Sql Azure highly available is causing databases to become randomly unavailable. I'd laugh at the irony if it wasn't killing us.

Have a look at this thread for details: http://social.technet.microsoft.com/Forums/en-US/ssdsgetstarted/thread/c3003a28-8beb-4860-85b2-03cf6d0312a8

answered Nov 15 '22 21:11

Paul DB

Related questions
                            
                                Convert MS SQL script to Mysql and Oracle
                            
                                how to select columns as rows?
                            
                                How to do a JOIN in SQLAlchemy on 3 tables, where one of them is mapping between other two?
                            
                                Generating the SQL query plan takes 5 minutes, the query itself runs in milliseconds. What's up?
                            
                                How do I add breakpoints to a stored SQL Procedure for debugging?
                            
                                Sanitizing SQL data
                            
                                What is the best way to build a complex NSCompoundPredicate?
                            
                                @ManyToMany without join table (legacy database)
                            
                                Why does sp_executesql run slower when parameters are passed as arguments
                            
                                java library to maintain database structure
                            
                                PDO datetime format for MSSQL/dblib
                            
                                PostgreSQL: How to implement minimum cardinality?
                            
                                LISTAGG equivalent with windowing clause
                            
                                What database for crawler/scraper?
                            
                                How can I get enum possible values in a MySQL database using php? [duplicate]
                            
                                truncated LISTAGG string [duplicate]
                            
                                unsafe assembly permission was denied on object 'server' database 'master' [closed]
                            
                                SUM() all results (no group by clause)
                            
                                How to get part of the string that matched with regular expression in Oracle SQL
                            
                                SQL Server, ISABOUT, weighted terms

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With