Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

WCF Client ignores timeout values when service down

I have a VB .NET application that uses WCF. I've set the client timeouts for everything in code:

    Dim oMastSrv As MastSvc.IclsIOXferClient = Nothing

    Dim binding As New ServiceModel.NetTcpBinding("NetTcpBinding_IclsIOXfer")
    Dim intTimeout As Integer = 2500
    binding.SendTimeout = New TimeSpan(0, 0, 0, 0, intTimeout)
    binding.ReceiveTimeout = New TimeSpan(0, 0, 0, 0, intTimeout)
    binding.OpenTimeout = New TimeSpan(0, 0, 0, 0, intTimeout)
    binding.CloseTimeout = New TimeSpan(0, 0, 0, 0, intTimeout)
    Dim address As New ServiceModel.EndpointAddress("net.tcp://" & GetSrvIP(intSrvID) & ":30000/MyMastSvc")

    oMastSrv = New MastSvc.IclsIOXferClient(binding, address)
    Try
        oMastSrv.ServiceConnect( ... )
        oMastSrv.InnerChannel.OperationTimeout = New TimeSpan(0, 0, 0, 0, intTimeout)
    Catch ex As Exception
        ...
    End Try

When the service I'm connected to crashes, though, the Endpoint Not Found exception takes over twenty seconds to be thrown, not the 2.5 I have specified. This is really mucking with my load balancing, I need to know that service is gone within 2.5 seconds. Is there any way to get this exception thrown within the desired time span?

BTW, the exception reads something like:

Could not connect to net.tcp://192.168.227.130:30000/MXIOXfer. The connection attempt lasted for a time span of 00:00:02.4209684. TCP error code 10060: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 192.168.227.130:30000.

but it really does take over twenty seconds. I've turned WCF tracing on and can see the TCP operation failed warning just before the exception and it has the REAL time:

Could not connect to net.tcp://192.168.227.130:30000/MXIOXfer. The connection attempt lasted for a time span of 00:00:21.0314092. TCP error code 10060: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 192.168.227.130:30000.

If it makes any difference, all the comms to the service are done on separate threads.

EDIT:

This thread seems to indicate that the socket timeouts are set by the operating system. Is there a registry setting for such things?

like image 479
MarkFisher Avatar asked Aug 10 '12 13:08

MarkFisher


1 Answers

Combining the details found in SO and MSDN Social threads referenced by me and eol led me to these registry settings:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Interfaces{xxxxxxxx-xxxx-xxxx-xxxxxxxxxxxx}\TcpInitialRTT

Value Type: REG_DWORD—number

Valid Range: 0–0xFFFF

Default: 3 seconds

Description: This parameter controls the initial time-out used for a TCP connection request and initial data retransmission on a per-interface basis. Use caution when tuning with this parameter because exponential backoff is used. Setting this value to larger than 3 results in much longer time-outs to nonexistent addresses.

.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Interfaces{xxxxxxxx-xxxx-xxxx-xxxxxxxxxxxx}\TcpMaxConnectRetransmissions

Value Type: REG_DWORD—number

Valid Range: 0–255 (decimal)

Default: 2

Description: This parameter determines the number of times that TCP retransmits a connect request (SYN) before aborting the attempt. The retransmission time-out is doubled with each successive retransmission in a given connect attempt. The initial time-out is controlled by the TcpInitialRtt registry value.

.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Interfaces{xxxxxxxx-xxxx-xxxx-xxxxxxxxxxxx}\TcpMaxDataRetransmissions

Value Type: REG_DWORD—number

Valid Range: 0–0xFFFFFFFF

Default: 5

Description: This parameter controls the number of times that TCP retransmits an individual data segment (not connection request segments) before aborting the connection. The retransmission time-out is doubled with each successive retransmission on a connection. It is reset when responses resume. The Retransmission Timeout (RTO) value is dynamically adjusted, using the historical measured round-trip time (Smoothed Round Trip Time, or SRTT) on each connection. The starting RTO on a new connection is controlled by the TcpInitialRtt registry value.

Since the timeout value on a failed connect is doubled for each retry, the default values make the first attempt fail in 3 seconds, the second fail in 6, and the third and final attempt fail in 12 seconds, or 21 seconds total. BTW, the TcpMaxDataRetransmissions key has nothing to do with this, I include it for completeness and those who come later.

None of these values are present by default, you have to add them to change them. Figuring out which interface(s) to do this on is easy, each interface has a key containing its current IP address. (There's even one for localhost.) In my own case, just setting the TcpMaxConnectRetransmissions to zero (0) on the VM interfaces defaults my socket timeout for them to 3 seconds, which is close enough to 2.5 to work. My load balancing works when a WCF service crashes now.

like image 123
MarkFisher Avatar answered Oct 06 '22 10:10

MarkFisher