3 different messages are being sent to the same port at different rates:
Message size (bytes) Sent everytransmit speed
High 232 10 ms 100Hz
Medium 148 20ms 50Hz
Low 20 60 ms 16.6Hz
I can only process one message every ~ 6 ms.
Single threaded. Blocking read.
A strange situation is occurring, and I don't have an explanation for it.
When I set my receive buffer to 4,799
bytes, all of my low speed messages get dropped.
I see maybe one or two get processed, and then nothing.
When I set my receive buffer to 4,800
(or higher!), it appears as though all of the low speed messages start getting processed. I see about 16/17 a second.
This has been observed consistently. The application sending the packets is always started before the receiving application. The receiving application always has a long delay after the sockets are created, and before it begins processing. So the buffer is always full when the processing starts, and it is not the same starting buffer each time a test occurs. This is because the socket is created after the sender is already sending out messages, so the receiver might start listening in the middle of a send cycle.
Why does increasing the received buffer size a single byte, cause a huge change in low speed message processing?
I built a table to better visualize the expected processing:
As some of these messages get processed, more messages presumably get put on the queue instead of being dropped.
Nonetheless, I would expect a 4,799
byte buffer to behave the same way as 4,800
bytes.
However that is not what I have observed.
I think the issue is related to the fact that low speed messages are sent at the same time as the other two messages. It is always received after the high/medium speed messages. (This has been confirmed over wireshark).
For example, assuming the buffer was empty to begin with, it is clear that the low speed message would need queued longer than the other messages.
*1 message every 6ms is about 5 messages every 30ms.
This still doesn't explain the buffer size.
We are running VxWorks, and using their sockLib, which is an implementation of Berkeley sockets. Here is a snippet of what our socket creation looks like:
SOCKET_BUFFER_SIZE is what I'm changing.
struct sockaddr_in tSocketAddress; // Socket address
int nSocketAddressSize = sizeof(struct sockaddr_in); // Size of socket address structure
int nSocketOption = 0;
// Already created
if (*ptParameters->m_pnIDReference != 0)
return FALSE;
// Create UDP socket
if ((*ptParameters->m_pnIDReference = socket(AF_INET, SOCK_DGRAM, 0)) == ERROR)
{
// Error
CreateSocketMessage(ptParameters, "CreateSocket: Socket create failed with error.");
// Not successful
return FALSE;
}
// Valid local address
if (ptParameters->m_szLocalIPAddress != SOCKET_ADDRESS_NONE_STRING && ptParameters->m_usLocalPort != 0)
{
// Set up the local parameters/port
bzero((char*)&tSocketAddress, nSocketAddressSize);
tSocketAddress.sin_len = (u_char)nSocketAddressSize;
tSocketAddress.sin_family = AF_INET;
tSocketAddress.sin_port = htons(ptParameters->m_usLocalPort);
// Check for any address
if (strcmp(ptParameters->m_szLocalIPAddress, SOCKET_ADDRESS_ANY_STRING) == 0)
tSocketAddress.sin_addr.s_addr = htonl(INADDR_ANY);
else
{
// Convert IP address for binding
if ((tSocketAddress.sin_addr.s_addr = inet_addr(ptParameters->m_szLocalIPAddress)) == ERROR)
{
// Error
CreateSocketMessage(ptParameters, "Unknown IP address.");
// Cleanup socket
close(*ptParameters->m_pnIDReference);
*ptParameters->m_pnIDReference = ERROR;
// Not successful
return FALSE;
}
}
// Bind the socket to the local address
if (bind(*ptParameters->m_pnIDReference, (struct sockaddr *)&tSocketAddress, nSocketAddressSize) == ERROR)
{
// Error
CreateSocketMessage(ptParameters, "Socket bind failed.");
// Cleanup socket
close(*ptParameters->m_pnIDReference);
*ptParameters->m_pnIDReference = ERROR;
// Not successful
return FALSE;
}
}
// Receive socket
if (ptParameters->m_eType == SOCKTYPE_RECEIVE || ptParameters->m_eType == SOCKTYPE_RECEIVE_AND_TRANSMIT)
{
// Set the receive buffer size
nSocketOption = SOCKET_BUFFER_SIZE;
if (setsockopt(*ptParameters->m_pnIDReference, SOL_SOCKET, SO_RCVBUF, (char *)&nSocketOption, sizeof(nSocketOption)) == ERROR)
{
// Error
CreateSocketMessage(ptParameters, "Socket buffer size set failed.");
// Cleanup socket
close(*ptParameters->m_pnIDReference);
*ptParameters->m_pnIDReference = ERROR;
// Not successful
return FALSE;
}
}
and the socket receive that's being called in an infinite loop:
*The buffer size is definitely large enough
int SocketReceive(int nSocketIndex, char *pBuffer, int nBufferLength)
{
int nBytesReceived = 0;
char szError[256];
// Invalid index or socket
if (nSocketIndex < 0 || nSocketIndex >= SOCKET_COUNT || g_pnSocketIDs[nSocketIndex] == 0)
{
sprintf(szError,"SocketReceive: Invalid socket (%d) or ID (%d)", nSocketIndex, g_pnSocketIDs[nSocketIndex]);
perror(szError);
return -1;
}
// Invalid buffer length
if (nBufferLength == 0)
{
perror("SocketReceive: zero buffer length");
return 0;
}
// Send data
nBytesReceived = recv(g_pnSocketIDs[nSocketIndex], pBuffer, nBufferLength, 0);
// Error in receiving
if (nBytesReceived == ERROR)
{
// Create error string
sprintf(szError, "SocketReceive: Data Receive Failure: <%d> ", errno);
// Set error message
perror(szError);
// Return error
return ERROR;
}
// Bytes received
return nBytesReceived;
}
Any clues on why increasing the buffer size to 4,800 results in successful and consistent reading of low speed messages?
Congestion in the network is the primary reason for packet loss in UDP, as every communication network has a flow limit. For example, network congestion is similar to a traffic jam on the road, where exceeding the maximum number of vehicles allowed on a given road may cause traffic to slow or stop during peak hours.
The default send buffer size for UDP sockets is 65535 bytes. The default receive buffer size for UDP sockets is 2147483647 bytes.
On every UDP socket, there's a “socket send buffer” that you put packets into. The Linux kernel deals with those packets and sends them out as quickly as possible. So if you have a network card that's too slow or something, it's possible that it will not be able to send the packets as fast as you put them in!
The basic answer to the question of why a SO_RCVBUF
size of 4799 results in lost low speed messages and a size of 4800 works fine is that with the mixture of the UDP packets coming in, the rate at which they are coming in, the rate at which you are processing incoming packets, and the sizing of the mbuff
and cluster numbers in your vxWorks kernel allow for sufficient network stack throughput that the low speed messages are not being discarded with the larger size.
The SO_SNDBUF
option description in the setsockopt()
man page at URL http://www.vxdev.com/docs/vx55man/vxworks/ref/sockLib.html#setsockopt mentioned in a comment above has this to say about the size specified and the effect on mbuff
usage:
The effect of setting the maximum size of buffers (for both SO_SNDBUF and SO_RCVBUF, described below) is not actually to allocate the mbufs from the mbuf pool. Instead, the effect is to set the high-water mark in the protocol data structure, which is used later to limit the amount of mbuf allocation.
UDP packets are discrete units. If you send 10 packets of size 232 that is not considered to be 2320 bytes of data in contiguous memory. Instead that is 10 memory buffers within the network stack because UDP is discrete packets while TCP is a continuous stream of bytes.
See How do I tune the network buffering in VxWorks 5.4? in the DDS community web site which gives a discussion about the interdependence of the mixture of mbuff
sizes and network clusters.
See How do I resolve a problem with VxWorks buffers? in the DDS community web site.
See this PDF of a slide presentation, A New Tool to study Network Stack Exhaustion in VxWorks from 2004 which discusses using various tools such as mBufShow
and inetStatShow
to see what is happening in the network stack.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With