We're running a web app (2 instances) on Azure, backed by a SQL Azure database. At any given time there are 50-150 users using the website. The database runs at S2 performance level. The DTU is around 20% on average.
However, a few times every day I suddenly get hundreds of errors in my logs with timeouts, like this:
An error occurred while executing the command definition. See the inner exception for details.
The wait operation timed out.
Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. This failure occurred while attempting to connect to the routing destination. The duration spent while attempting to connect to the original server was - [Pre-Login] initialization=1; handshake=21; [Login] initialization=0; authentication=0; [Post-Login] complete=1;
We're using EF6 for queries with the default command timeout. I've configured this execution strategy:
SetExecutionStrategy("System.Data.SqlClient",
() => new SqlAzureExecutionStrategy(10, TimeSpan.FromSeconds(15)));
The database (about 15GB total) is heavily indexed. These errors occur all over the place, usually dozens to hundreds within 1-2 minutes.
What steps can I take to prevent this from happening?
If you encounter a connection-timeout error, follow the steps: Increase the connection-timeout parameter. If you use an application to connect to SQL Server, increase the relevant connection-timeout parameter values and check whether the connection eventually succeeds.
You need to configure timeout parameter in SQL server settings as shown on the screenshot. 'Remote query timeout' parameter must be set in 0 instead of 600.
Using SQL Server Management Studio Connect to MS SQL server via SQL Management Studio. In Object Explorer, right-click on the server name and then select Properties. In the new tab, click on Connections node. In Remote Query Timeout change it to your desired value or specify 0 to set no limit.
The fact that it happens in 1-2 minutes might mean a burst in activity or some process that might be locking up tables.
If your DTU during those times is at 20% is not a CPU issue, but you can always find which are the bottlenecks by running this query on the DB:
SELECT TOP 10
total_worker_time/execution_count AS Avg_CPU_Time
,execution_count
,total_elapsed_time/execution_count as AVG_Run_Time
,(SELECT
SUBSTRING(text,statement_start_offset/2,(CASE
WHEN statement_end_offset = -1 THEN LEN(CONVERT(nvarchar(max), text)) * 2
ELSE statement_end_offset
END -statement_start_offset)/2
) FROM sys.dm_exec_sql_text(sql_handle)
) AS query_text
FROM sys.dm_exec_query_stats
ORDER BY Avg_CPU_Time DESC
Even if the DB is heavily indexed, indexes get fragmented, I'd advice running this to check the current fragmentation:
select a.*,b.AverageFragmentation from
( SELECT tbl.name AS [Table_Name], tbl.object_id, i.name AS [Name], i.index_id, CAST(CASE i.index_id WHEN 1 THEN 1 ELSE 0 END AS bit) AS [IsClustered],
CAST(case when i.type=3 then 1 else 0 end AS bit) AS [IsXmlIndex], CAST(case when i.type=4 then 1 else 0 end AS bit) AS [IsSpatialIndex]
FROM
sys.tables AS tbl
INNER JOIN sys.indexes AS i ON (i.index_id > 0 and i.is_hypothetical = 0) AND (i.object_id=tbl.object_id))a
inner join
( SELECT tbl.object_id, i.index_id, fi.avg_fragmentation_in_percent AS [AverageFragmentation]
FROM
sys.tables AS tbl
INNER JOIN sys.indexes AS i ON (i.index_id > 0 and i.is_hypothetical = 0) AND (i.object_id=tbl.object_id)
INNER JOIN sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'LIMITED') AS fi ON fi.object_id=CAST(i.object_id AS int) AND fi.index_id=CAST(i.index_id AS int)
)b
on a.object_id=b.object_id and a.index_id=b.index_id
order by AverageFragmentation desc
You can also use Azure Automation to schedule an automatic rebuilding of fragmented indexes, see answer at: Why my Azure SQL Database indexes are still fragmented?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With