Long delays in sending UDP packets

Question

I have an application that receives, processes, and transmits UDP packets.

Everything works fine if the port numbers for reception and transmission are different.

If the port numbers are the same and the IP addresses are different it usually works fine EXCEPT when the IP address are on the same subnet as the machine running the application. In that last case the send_to function requires several seconds to complete, instead of a few milliseconds as is usual.

Rx Port  Tx IP          Tx Port    Result

5001     Same           5002       OK  Delay ~ 0.001 secs
         subnet     

5001     Different      5001       OK  Delay ~ 0.001 secs
         subnet

5001     Same           5001       Fails  Delay > 2 secs
         subnet

Here is a short program that demonstrates the problem.

#include <ctime>
#include <iostream>
#include <string>
#include <boost/array.hpp>
#include <boost/asio.hpp>

using boost::asio::ip::udp;
using std::cout;
using std::endl;

int test( const std::string& output_IP)
{
    try
    {
        unsigned short prev_seq_no;

        boost::asio::io_service io_service;

        // build the input socket

        /* This is connected to a UDP client that is running continuously
        sending messages that include an incrementing sequence number
        */

        const int input_port = 5001;
        udp::socket input_socket(io_service, udp::endpoint(udp::v4(), input_port ));

        // build the output socket

        const std::string output_Port = "5001";
        udp::resolver resolver(io_service);
        udp::resolver::query query(udp::v4(), output_IP, output_Port );
        udp::endpoint output_endpoint = *resolver.resolve(query);
        udp::socket output_socket( io_service );
        output_socket.open(udp::v4());

       // double output buffer size
       boost::asio::socket_base::send_buffer_size option( 8192 * 2 );
       output_socket.set_option(option);

        cout  << "TX to " << output_endpoint.address() << ":"  << output_endpoint.port() << endl;



        int count = 0;
        for (;;)
        {
            // receive packet
            unsigned short recv_buf[ 20000 ];
            udp::endpoint remote_endpoint;
            boost::system::error_code error;
            int bytes_received = input_socket.receive_from(boost::asio::buffer(recv_buf,20000),
                                 remote_endpoint, 0, error);

            if (error && error != boost::asio::error::message_size)
                throw boost::system::system_error(error);

            // start timer
            __int64 TimeStart;
            QueryPerformanceCounter( (LARGE_INTEGER *)&TimeStart );

            // send onwards
            boost::system::error_code ignored_error;
            output_socket.send_to(
                boost::asio::buffer(recv_buf,bytes_received),
                output_endpoint, 0, ignored_error);

            // stop time and display tx time
            __int64 TimeEnd;
            QueryPerformanceCounter( (LARGE_INTEGER *)&TimeEnd );
            __int64 f;
            QueryPerformanceFrequency( (LARGE_INTEGER *)&f );
            cout << "Send time secs " << (double) ( TimeEnd - TimeStart ) / (double) f << endl;

            // stop after loops
            if( count++ > 10 )
                break;
        }
    }
    catch (std::exception& e)
    {
        std::cerr << e.what() << std::endl;
    }

}
int main(  )
{

    test( "193.168.1.200" );

    test( "192.168.1.200" );

    return 0;
}

The output from this program, when running on a machine with address 192.168.1.101

TX to 193.168.1.200:5001
Send time secs 0.0232749
Send time secs 0.00541566
Send time secs 0.00924535
Send time secs 0.00449014
Send time secs 0.00616714
Send time secs 0.0199299
Send time secs 0.00746081
Send time secs 0.000157972
Send time secs 0.000246775
Send time secs 0.00775578
Send time secs 0.00477618
Send time secs 0.0187321
TX to 192.168.1.200:5001
Send time secs 1.39485
Send time secs 3.00026
Send time secs 3.00104
Send time secs 0.00025927
Send time secs 3.00163
Send time secs 2.99895
Send time secs 6.64908e-005
Send time secs 2.99864
Send time secs 2.98798
Send time secs 3.00001
Send time secs 3.00124
Send time secs 9.86207e-005

Why is this happening? Is there any way I can reduce the delay?

Notes:

Built using code::blocks, running under various flavours of Windows
Packet are 10000 bytes long
The problem goes away if I connect the computer running the application to a second network. For example a WWLAN ( cellular network "rocket stick" )

As far as I can tell, this is the situation we have:

This works ( different ports, same LAN ):

enter image description here

This also works ( same ports, different LANS ):

enter image description here

This does NOT work ( same ports, same LAN ):

enter image description here

This seems to work ( same ports, same LAN, dual homed Module2 host )

enter image description here

Tanner Sansbury · Accepted Answer

Given this is being observed on Windows for large datagrams with a destination address of a non-existent peer within the same subnet as the sender, the problem is likely the result of send() blocking waiting for an Address Resolution Protocol (ARP) response so that the layer2 ethernet frame can populated:

When sending data, the layer2 ethernet frame will be populated with the media access control (MAC) Address of the next hop in the route. If the sender does not know the MAC Address for the next hop, it will broadcast an ARP request and cache responses. Using the sender's subnet mask and the destination address, the sender can determine if the next hop is on the same subnet as the sender or if the data must route through the default gateway. Based on the results in the question, when sending large datagrams:
- datagrams destined to a different subnet have no delay because the default gateway's MAC Address is within the sender's ARP cache
- datagrams destined to a non-existent peer on the sender's subnet incur a delay waiting for ARP resolution
The socket's send buffer size (SO_SNDBUF) is being set to 16384 bytes, but the size of datagrams being sent are 10000. It is unspecified as to the behavior behavior of send() when the buffer is saturated, but some systems will observe send() blocking. In this case, saturation would occur fairly quickly if any datagrams incur a delay, such as by waiting for an ARP response.
```
// Datagrams being sent are 10000 bytes, but the socket buffer is 16384.
boost::asio::socket_base::send_buffer_size option(8192 * 2);
output_socket.set_option(option);
```
Consider letting the kernel manage the socket buffer size or increasing it based on your expected throughput.
When sending a datagram with a size that exceeds the Window's registry FastSendDatagramThreshold‌ parameter, the send() call can block until the datagram has been sent. For more details, see the Microsoft TCP/IP Implementation Details:

Datagrams smaller than the value of this parameter go through the fast I/O path or are buffered on send. Larger ones are held until the datagram is actually sent. The default value was found by testing to be the best overall value for performance. Fast I/O means copying data and bypassing the I/O subsystem, instead of mapping memory and going through the I/O subsystem. This is advantageous for small amounts of data. Changing this value is not generally recommended.

If one is observing delays for each send() to an existing peer on the sender's subnet, then profile and analyze the network:

Use iperf to measure the network potential throughput
Use wireshark to get a deeper view into what is occurring on a given node. Look for ARP request and responses.
From the sender's machine, ping the peer and then check the APR cache. Verify that there is a cache entry for the peer and that it is correct.
Try a different port and/or TCP. This can help identify if a networks policies are throttling or shaping traffic for a particular port or protocol.

Also note that sending datagrams below the FastSendDatagramThreshold value in quick succession while waiting for ARP to resolve may cause datagrams to be discarded:

ARP queues only one outbound IP datagram for a specified destination address while that IP address is being resolved to a media access control address. If a User Datagram Protocol (UDP)-based application sends multiple IP datagrams to a single destination address without any pauses between them, some of the datagrams may be dropped if there is no ARP cache entry already present. An application can compensate for this by calling the iphlpapi.dll routine SendArp() to establish an ARP cache entry, before sending the stream of packets.

Long delays in sending UDP packets

Tags:

c++

windows

sockets

boost-asio

udp

ravenspoint

1 Answers

Tanner Sansbury

Recent Activity

Donate For Us

Long delays in sending UDP packets

Tags:

c++

windows

sockets

boost-asio

udp

ravenspoint

1 Answers

Tanner Sansbury

Related questions

Recent Activity

Donate For Us