When doing a TCP analysis of the traffic between my web servers and database servers I see the network buffers (TCP Window) filling up frequently. The web servers then send TCP messages to the database server telling it that its buffers are full an not to send more data until given an update.
For example, this is the size of the network buffer in bytes for one of the more long lived connections to the database server over time:
The web servers are running a .NET 4.0 application running in IIS integrated mode on Windows 2008 R2 web servers. The SQL server is a 2008 R2 server.
My interpretation of this is that the SQL server is returning data to the web servers faster then the application on the web server can collect the data from the buffers. I have tried tuning just about everything I can in the network drivers to work around this issue. In particular increasing the RSS queues, disabling interrupt moderation, and setting Windows 2008 R2 server to increase the buffer size more aggressively.
So if my interpretation is correct that leaves me wondering about two possibilities:
Edit:
Requested DMV Query cutting off at ASYNC_NETWORK_IO:
SELECT * FROM sys.dm_os_wait_stats ORDER BY waiting_tasks_count desc;
wait_type waiting_tasks_count wait_time_ms max_wait_time_ms signal_wait_time_ms CXPACKET 1436226309 2772827343 39259 354295135 SLEEP_TASK 231661274 337253925 10808 71665032 LATCH_EX 214958564 894509148 11855 84816450 SOS_SCHEDULER_YIELD 176997645 227440530 2997 227332659 ASYNC_NETWORK_IO 112914243 84132232 16707 16250951
1) What makes you think that this is TCP flow control, as opposed to SQL Server not producing data in the intervals where there is no traffic? Check if sys.dm_exec_requests
look at wait_type. The wait types are described in Waits and Queues. If is indeed the client applying TCP flow control, then you'll see the wait type ASYNC_NETWORK_IO
.
2) If the issue is indeed network wait type, then the solution is not to increase the bandwidth, but obviously to reduce the traffic. The client has no business requesting so much data from the server as to cause TCP flow control. This would be caused by doing horribly wrong things in the client, like counting rows or client-side paging. Move the processing on the server, and just get small results sets with the data you need.
Edit
Consuming the DB call result set ultimately boils down to one form or another of this:
FetchNextRow
while (not EnfOfResults)
{
ProcessRow;
FetchNextRow;
}
What this could mean, in real terms, it could be foreach row in IQueryable
or SqlDataReader.Read()
. But the basic idea is the same, that the client fetches rows from the result, process them, then gets some more rows. If the client code does anything in that ProcessRow
that blocks, then the client code will not reach the point where it fetches the next row again, and thus will eventually trigger TCP flow control which in turn will cause SQL Server to suspend the query (since it has no place to write the results into). There is nothing you can do in terms of TCP to make this better. Increasing the window size can actually make maters worse, as now all those results that were previously suppressed at source (DB) are going to be created and have to be stored somewhere, which will ultimately mean live memory allocated to storage and may make things far worse than they are now.
If I would be in your shoes right now I'd focus on identifying where does that ProcessRow
blocking occur. An hypothesis I put forward was that that processing would be an MVC View writing into the response buffer and being blocked in turn by TCP flow control resulting from the user agent not consuming the HTTP response (eg. Ajax call completed but the browser is not running the completion code to consume the response because the main thread is looping busy on something else). As always, the best approach is to methodically measure. Some possible tools:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With