Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you deal with transport-level errors in SqlConnection?

Every now and then in a high volume .NET application, you might see this exception when you try to execute a query:

System.Data.SqlClient.SqlException: A transport-level error has occurred when sending the request to the server.

According to my research, this is something that "just happens" and not much can be done to prevent it. It does not happen as a result of a bad query, and generally cannot be duplicated. It just crops up maybe once every few days in a busy OLTP system when the TCP connection to the database goes bad for some reason.

I am forced to detect this error by parsing the exception message, and then retrying the entire operation from scratch, to include using a new connection. None of that is pretty.

Anybody have any alternate solutions?

like image 700
Eric Z Beard Avatar asked Aug 19 '08 17:08

Eric Z Beard


People also ask

What is a transport level error?

A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) Also, find attached screenshot. Thank you in advance.

How should transient network connectivity issues in Azure SQL be handled by a client application?

Applications that connect to your database should be built to expect these transient errors. To handle them, implement retry logic in their code instead of surfacing them to users as application errors. If your client program uses ADO.NET, your program is told about the transient error by the throw of SqlException.


2 Answers

I posted an answer on another question on another topic that might have some use here. That answer involved SMB connections, not SQL. However it was identical in that it involved a low-level transport error.

What we found was that in a heavy load situation, it was fairly easy for the remote server to time out connections at the TCP layer simply because the server was busy. Part of the reason was the defaults for how many times TCP will retransmit data on Windows weren't appropriate for our situation.

Take a look at the registry settings for tuning TCP/IP on Windows. In particular you want to look at TcpMaxDataRetransmissions and maybe TcpMaxConnectRetransmissions. These default to 5 and 2 respectively, try upping them a little bit on the client system and duplicate the load situation.

Don't go crazy! TCP doubles the timeout with each successive retransmission, so the timeout behavior for bad connections can go exponential on you if you increase these too much. As I recall upping TcpMaxDataRetransmissions to 6 or 7 solved our problem in the vast majority of cases.

like image 154
Tim Farley Avatar answered Sep 23 '22 22:09

Tim Farley


This blog post by Michael Aspengren explains the error message "A transport-level error has occurred when sending the request to the server."

like image 39
Magnus Lindhe Avatar answered Sep 19 '22 22:09

Magnus Lindhe