We are running a .NET Core 3.1 application in a Kubernetes cluster. The application connects to an Azure SQL Database using EF Core 3.1.7, with Microsoft.Data.SqlClient 1.1.3.
At seemingly random times, we would receive the following error.
---> System.Data.SqlClient.SqlException (0x80131904): Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
---> System.ComponentModel.Win32Exception (258): Unknown error 258
at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParserStateObject.ThrowExceptionAndWarning(Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error)
at System.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync()
at System.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket()
at System.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer()
at System.Data.SqlClient.TdsParserStateObject.TryReadByte(Byte& value)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
at System.Data.SqlClient.SqlDataReader.get_MetaData()
at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds)
at System.Data.SqlClient.SqlCommand.ExecuteScalar()
Even though it seems random, it definitely happens more often under heavier loads. From my research, it appears as if this specific timeout is related to the connection timeout rather than the command timeout. I.e. the client is not able to establish a connection at all. This is not a query that is timing out.
Potential root causes we've eliminated:
TrusServerCertificate
set to true. Here is the current connection string: Server=tcp:***.database.windows.net,1433;Initial Catalog=***;Persist Security Info=False;User ID=***;Password=***;MultipleActiveResultSets=False;Encrypt=True;Connection Timeout=60;TrustServerCertificate=True;
Update 1: As requested, an example of two timeouts that just occurred. It is a Sunday, so traffic is extremely low. Database utilization (CPU, Mem, IO) is sitting between 2-6%.
Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
---> System.ComponentModel.Win32Exception (258): Unknown error 258
at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at Microsoft.Data.SqlClient.TdsParserStateObject.ThrowExceptionAndWarning(Boolean callerHasConnectionLock, Boolean asyncClose)
at Microsoft.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error)
at Microsoft.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync()
at Microsoft.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket()
at Microsoft.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer()
at Microsoft.Data.SqlClient.TdsParserStateObject.TryReadByte(Byte& value)
at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at Microsoft.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
at Microsoft.Data.SqlClient.TdsParser.TdsExecuteTransactionManagerRequest(Byte[] buffer, TransactionManagerRequestType request, String transactionName, TransactionManagerIsolationLevel isoLevel, Int32 timeout, SqlInternalTransaction transaction, TdsParserStateObject stateObj, Boolean isDelegateControlRequest)
at Microsoft.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransactionYukon(TransactionRequest transactionRequest, String transactionName, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
at Microsoft.Data.SqlClient.SqlInternalConnection.BeginSqlTransaction(IsolationLevel iso, String transactionName, Boolean shouldReconnect)
at Microsoft.Data.SqlClient.SqlConnection.BeginTransaction(IsolationLevel iso, String transactionName)
at Microsoft.Data.SqlClient.SqlConnection.BeginDbTransaction(IsolationLevel isolationLevel)
at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.BeginTransaction(IsolationLevel isolationLevel)
at Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.Execute[TState,TResult](TState state, Func`3 operation, Func`3 verifySucceeded)
We are also receiving errors on our database health checker when using this command: Microsoft.EntityFrameworkCore.Infrastructure.DatabaseFacade.CanConnect()
The above stack trace is the issue we are trying to solve versus this stack trace below of the SQL query timing out.
Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
---> System.ComponentModel.Win32Exception (258): Unknown error 258
at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at Microsoft.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
at Microsoft.Data.SqlClient.SqlDataReader.get_MetaData()
at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)
at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean isAsync, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest)
at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry, String method)
at Microsoft.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior)
Applications that connect to your database should be built to expect these transient errors. To handle them, implement retry logic in their code instead of surfacing them to users as application errors. If your client program uses ADO.NET, your program is told about the transient error by the throw of SqlException.
If the application persistently fails to connect to Azure SQL Database, it usually indicates an issue with one of the following: Firewall configuration. The Azure SQL database or client-side firewall is blocking connections to Azure SQL Database.
If you are unable to connect to SQL Managed Instance from an Azure virtual machine within the same virtual network but a different subnet, check if you have a Network Security Group set on VM subnet that might be blocking access.
Database transaction units (DTUs) A database transaction unit (DTU) represents a blended measure of CPU, memory, reads, and writes. Service tiers in the DTU-based purchasing model are differentiated by a range of compute sizes with a fixed amount of included storage, fixed retention period for backups, and fixed price.
Error 40615: Cannot connect to < servername > To resolve this issue, configure firewall settings on SQL Database through the Azure portal. Error 5: Cannot connect to < servername > To resolve this issue, make sure that port 1433 is open for outbound connections on all firewalls between the client and the internet.
Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 0 - The wait operation timed out.) (Microsoft SQL Server, Error: 258) Configured server machine to have static IPv4/DNS addresses.
Azure Kubernetes Service (AKS) reduces the complexity and operational overhead of managing a Kubernetes cluster by offloading much of that responsibility to the Azure. With Azure DevOps and Azure Container Services (AKS), we can build DevOps for dockerized applications by leveraging docker capabilities enabled on Azure DevOps Hosted Agents ...
The connection string name is used later in the tutorial to connect the Azure app to an Azure SQL Database. In the Solution Explorer, right-click your DotNetAppSqlDb project and select Publish. Select Azure as your target and click Next. Make sure that Azure App Service (Windows) is selected and click Next.
The problem was an infrastructure issue at Azure.
There is a known issue within Azure Network where the dhcp lease is lost whenever a disk attach/detach happens on some VM fleets. There is a fix rolling out at the moment to regions. I'll check to see when Azure Status update will be published for this.
The problem disappeared, so it appears as if the fix has been rolled out globally.
For anyone else running into this issue in the future, you can identify it by establishing an SSH connection into the node (not the pod). Do an ls -al /var/log/
and identify all the syslog
files and run the following grep on each file.
cat /var/log/syslog | grep 'carrier'
If you have any Lost carrier
and Gained carrier
messages in the log, there is a some sort of a network issue. In our case it was the DHCP lease.
from my research, it appears as if this specific timeout is related to the connection timeout rather than the command timeout
I don't think so. The callstack goes through System.Data.SqlClient.SqlCommand.ExecuteScalar()
so it's running a query, after a successful connection.
This is a CommandTimeout, caused by the client abandoning a long-running command. The default CommandTimeout is 30sec.
To troubleshoot why the command is taking a long time, start with the Query Store and the related Query Performance Insight.
There's some noise about this error on GitHub, but I don't see any evidence that there's anything other than ordinary Command Timeouts going on. Eg if you run
using (var con = new SqlConnection(constr))
{
con.Open();
var sql = @"waitfor delay '01:00:00'";
var cmd = con.CreateCommand();
//cmd.CommandTimeout = 0;
cmd.CommandText = sql;
try
{
Console.WriteLine(DateTime.Now);
cmd.ExecuteNonQuery();
}
catch (Exception ex)
{
Console.WriteLine(DateTime.Now);
Console.WriteLine(ex);
}
}
You'll get (With Microsoft.Data.SqlClient):
Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
---> System.ComponentModel.Win32Exception (258): The wait operation timed out.
at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at Microsoft.Data.SqlClient.SqlCommand.RunExecuteNonQueryTds(String methodName, Boolean isAsync, Int32 timeout, Boolean asyncWrite)
at Microsoft.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, Boolean sendToPipe, Int32 timeout, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry, String methodName)
at Microsoft.Data.SqlClient.SqlCommand.ExecuteNonQuery()
at SqlClientTest.Program.Main(String[] args) in C:\Users\david\source\repos\SqlClientTest\SqlClientTest\Program.cs:line 34
Or slightly different for System.Data.SqlClient (which you appear to be using):
System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
---> System.ComponentModel.Win32Exception (258): The wait operation timed out.
--- End of inner exception stack trace ---
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at System.Data.SqlClient.SqlCommand.RunExecuteNonQueryTds(String methodName, Boolean async, Int32 timeout, Boolean asyncWrite)
at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, Boolean sendToPipe, Int32 timeout, Boolean asyncWrite, String methodName)
at System.Data.SqlClient.SqlCommand.ExecuteNonQuery()
The difference between
System.ComponentModel.Win32Exception (258): The wait operation timed out.
and
System.ComponentModel.Win32Exception (258): Unknown error 258
is probably just the availability of the Win32Exception descriptions on Windows vs Linux.
In my case I was using an Azure SQL Database without enough DTUs for the workload. The solution was to upgrade the database to a higher tier with more DTUs allocated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With