Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server DbCommand Timeout with .Net Core container under load

I'm running a .Net Core container on Open Shift Enterprise V3 pointing to a SQL Server database.

I have a .Net Core REST API with a put method which adds or updates a record in the database.

The table I am adding/updating had 3000 records and has indexes.

This works fine locally using the same database and with the container. However when I start to put load through the the container (approximately 50 concurrent http connections with JMeter) I get random timeouts with the error message below. I don't get the problem running locally.

The local machine is a lot more powerful than the container and I have increased the CPU power on the container but this doesn't appear to have made any difference.

Any suggestions on things to try would be appreciated.

[10:45:36 ERR] Failed executing DbCommand (35,001ms) [Parameters=[@__get_Item_0='?' (Size = 255) (DbType = AnsiString)], CommandType='Text', CommandTimeout='30']
SELECT [e].[host_name], [e].[data_centre], [e].[Environment], [e].[is_physical_machine], [e].[mac_address], [e].[number_of_cores], [e].[number_of_sockets], [e].[number_of_v_cores], [e].[number_ofcpus], [e].[operating_system], [e].[operating_system_version], [e].[processor], [e].[uuid]
FROM [EntitlementServer].[host] AS [e]
WHERE [e].[host_name] = @__get_Item_0
System.Data.SqlClient.SqlException (0x80131904): Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.ComponentModel.Win32Exception (258): Unknown error 258
   at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error)
   at System.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync()
   at System.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket()
   at System.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer()
   at System.Data.SqlClient.TdsParserStateObject.TryReadByte(Byte& value)
   at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
   at System.Data.SqlClient.SqlDataReader.get_MetaData()
   at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
   at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean asyncWrite, String method)
   at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior)
   at System.Data.SqlClient.SqlCommand.ExecuteDbDataReader(CommandBehavior behavior)
   at System.Data.Common.DbCommand.ExecuteReader()
   at Microsoft.EntityFrameworkCore.Storage.Internal.RelationalCommand.Execute(IRelationalConnection connection, DbCommandMethod executeMethod, IReadOnlyDictionary`2 parameterValues)
ClientConnectionId:6ca037fc-9671-4d43-bebe-35879203c682
Error Number:-2,State:0,Class:11

Update 1

Whilst I found that scaling up the CPU and memory on the container did not help with performance. When I increased the number of running containers, that did increase throughput and the timeout issue went away. I am not sure why this would be the case.

like image 406
mattbloke Avatar asked Jul 30 '19 11:07

mattbloke


1 Answers

The problem actually turned out to be a threading issue. The db timeout was a symptom. This is the reason why increasing the number of containers stopped the problem because the number of http threads also increased.

By making code asynchronous so that threads are released I have been able to fix the problem without having to increase the number of containers.

like image 152
mattbloke Avatar answered Oct 15 '22 05:10

mattbloke