I am working on a LAN based solution with a "server" that has to control a number of "players" My protocol of choice is UDP because its easy, I do not need connections, my traffic consists only of short commands from time to time and I want to use a mix of broadcast messages for syncing and single target messages for player individual commands.
Multicast TCP would be an alternative, but its more complicated, not exactly suited for the task and often not well supported by hardware.
Unfortunately I am running into a strange problem:
The first datagram which is sent to a specific ip using "sendto" is lost. Any datagram sent short time afterwards to the same ip is received. But if i wait some time (a few minutes) the first "sendto" is lost again.
Broadcast datagrams always work. Local sends (to the same computer) always work.
I presume the operating system or the router/switch has some translation table from IP to MAC addresses which gets forgotten when not being used for some minutes and that unfortunately causes datagrams to be lost. I could observe that behaviour with different router/switch hardware, so my suspect is the windows networking layer.
I know that UDP is by definition "unreliable" but I cannot believe that this goes so far that even if the physical connection is working and everything is well defined packets can get lost. Then it would be literally worthless.
Technically I am opening an UDP Socket, bind it to a port and INADRR_ANY. Then I am using "sendto" and "recvfrom". I never do a connect - I dont want to because I have several players. As far as I know UDP should work without connect.
My current workaround is that I regularly send dummy datagrams to all specific player ips - that solves the problem but its somehow "unsatisfying"
Question: Does anybody know that problem? Where does it come from? How can I solve it?
Edit:
I boiled it down to the following test program:
int _tmain(int argc, _TCHAR* argv[])
{
WSADATA wsaData;
WSAStartup(MAKEWORD(2, 2), &wsaData);
SOCKET Sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
SOCKADDR_IN Local = {0};
Local.sin_family = AF_INET;
Local.sin_addr.S_un.S_addr = htonl(INADDR_ANY);
Local.sin_port = htons(1234);
bind(Sock, (SOCKADDR*)&Local, sizeof(Local));
printf("Press any key to send...\n");
int Ret, i = 0;
char Buf[4096];
SOCKADDR_IN Remote = {0};
Remote.sin_family = AF_INET;
Remote.sin_addr.S_un.S_addr = inet_addr("192.168.1.12"); // Replace this with a valid LAN IP which is not the hosts one
Remote.sin_port = htons(1235);
while(true) {
_getch();
sprintf(Buf, "ping %d", ++i);
printf("Multiple sending \"%s\"\n", Buf);
// Ret = connect(Sock, (SOCKADDR*)&Remote, sizeof(Remote));
// if (Ret == SOCKET_ERROR) printf("Connect Error!\n", Buf);
Ret = sendto(Sock, Buf, strlen(Buf), 0, (SOCKADDR*)&Remote, sizeof(Remote));
if (Ret != strlen(Buf)) printf("Send Error!\n", Buf);
Ret = sendto(Sock, Buf, strlen(Buf), 0, (SOCKADDR*)&Remote, sizeof(Remote));
if (Ret != strlen(Buf)) printf("Send Error!\n", Buf);
Ret = sendto(Sock, Buf, strlen(Buf), 0, (SOCKADDR*)&Remote, sizeof(Remote));
if (Ret != strlen(Buf)) printf("Send Error!\n", Buf);
}
return 0;
The Program opens an UDP Socket, and sends 3 datagrams in a row on every keystroke to a specific IP. Run that whith wireshark observing your UDP traffic, press a key, wait a while and press a key again. You do not need a receiver on the remote IP, makes no difference, except you wont get the black marked "not reachable" packets. This is what you get:
As you can see the first sending initiated a ARP search for the IP. While that search was pending the first 2 of the 3 successive sends were lost. The second keystroke (after the IP search was complete) properly sent 3 messages. You may now repeat sending messages and it will work until you wait (about a minute until the adress translation gets lost again) then you will see dropouts again.
That means: There is no send buffer when sending UDP messages and there are ARP requests pending! All messages get lost except the last one. Also "sendto" does not block until it successfully delivered, and there is no error return!
Well, that surprises me and makes me a little bit sad, because it means that I have to live with my current workaround or implement an ACK system that only sends one message at a time and then waits for reply - which would not be easy any more and imply many difficulties.
The UDP packet loss is especially affected by TCP traffic and its flow control mechanism. This is because TCP flow control continues to increase its window size until packet loss occurs if the advertised window size is large enough.
However, the transmission of information using UDP can't provide flow control through a communication process. Hence, using UDP, we can't detect lost packets. Therefore, retransmission of the lost packets is not possible. Real-time applications mainly use UDP.
In certain variants of TCP, if a transmitted packet is lost, it will be re-sent along with every packet that had already been sent after it. Protocols such as User Datagram Protocol (UDP) provide no recovery for lost packets.
A short datagram will fit in a single IP packet. A maximally sized datagram may take about 40. If you have a 1% packet loss rate, then the short datagrams get lost 1% of the time, but the huge ones get lost 33% of the time ( 0.99^40 ). With a 10% packet loss you get almost 99% loss of maximally sized UDP datagrams.
Sorry, something went wrong. As a test, try putting a 1 sec delay between UDP sends. It's possible that the library buffers (or drops) the packet until the ARP request/response completes in order to find the receivers MAC address. Note that by definition, UDP is unreliable and packet loss should be expected.
So in conclusion, we will always have a first UDP packet loss for a client that is transmitting many UDP packets > ARP_QUEUE_LEN before the IP addr are resolved.
The Program opens an UDP Socket, and sends 3 datagrams in a row on every keystroke to a specific IP. Run that whith wireshark observing your UDP traffic, press a key, wait a while and press a key again. You do not need a receiver on the remote IP, makes no difference, except you wont get the black marked "not reachable" packets.
UDP packets are supposed to be buffered on receipt, but a UDP packet (or the ethernet frame holding it) can be dropped at several points on a given machine: receive buffer of the listening application socket is full.
I'm posting this long after it's been answered by others, but it's directly related.
Winsock drops UDP packets if there's no ARP entry for the destination address (or the gateway for the destination).
Thus it's quite likely some of the first UDP packet gets dropped as at that time there's no ARP entry - and unlike most other operating systems, winsock only queues 1 packet while the the ARP request completes.
This is documented here:
ARP queues only one outbound IP datagram for a specified destination address while that IP address is being resolved to a MAC address. If a UDP-based application sends multiple IP datagrams to a single destination address without any pauses between them, some of the datagrams may be dropped if there is no ARP cache entry already present. An application can compensate for this by calling the Iphlpapi.dll routine SendArp() to establish an ARP cache entry, before sending the stream of packets.
The same behavior can be observed on Mac OS X and FreeBSD:
When an interface requests a mapping for an address not in the cache, ARP queues the message which requires the mapping and broadcasts a message on the associated associated network requesting the address mapping. If a response is provided, the new mapping is cached and any pending message is transmitted. ARP will queue at most one packet while waiting for a response to a mapping request; only the most recently ``transmitted'' packet is kept.
UDP packets are supposed to be buffered on receipt, but a UDP packet (or the ethernet frame holding it) can be dropped at several points on a given machine:
First two points are about too much traffic, which is not likely the case here. Then I trust that point 4. is not applicable and your software is waiting for the data. Point 5. is about your application not processing network data fast enough - also does not seem like the case.
Translation between MAC and IP addresses is done via Address Resolution Protocol. This does not cause packet drop if your network is properly configured.
I would disable Windows firewall and any anti-virus/deep packet inspection software and check what's on the wire with wireshark. This will most likely point you into right direction - if you can sniff those first packets on the "sent-to" machines then check local configuration (firewall, etc.); if you don't, then check your network - something in the path is interfering with your traffic.
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With