Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Payload split over two TCP packets when using Boost ASIO, when it fits within the MTU

I have a problem with a boost::asio::ip::tcp::iostream. I am trying to send about 20 raw bytes. The problem is that this 20 byte payload is split into two TCP packets with 1 byte, then 19 bytes. Simple problem, why it is happening I have no idea. I am writing this for a legacy binary protocol that very much requires the payload to fit in a single TCP packet (groan).

Pasting the whole source from my program would be long and overly complex, I've posted the functional issue just within 2 functions here (tested, it does reproduce the issue);

#include <iostream>

// BEGIN cygwin nastyness
// The following macros and conditions are to address a Boost compile
// issue on cygwin. https://svn.boost.org/trac/boost/ticket/4816
//
/// 1st issue
#include <boost/asio/detail/pipe_select_interrupter.hpp>

/// 2nd issue
#ifdef __CYGWIN__
#include <termios.h>
#ifdef cfgetospeed
#define __cfgetospeed__impl(tp) cfgetospeed(tp)
#undef cfgetospeed
inline speed_t cfgetospeed(const struct termios *tp)
{
    return __cfgetospeed__impl(tp);
}
#undef __cfgetospeed__impl
#endif /// cfgetospeed is a macro

/// 3rd issue
#undef __CYGWIN__
#include <boost/asio/detail/buffer_sequence_adapter.hpp>
#define __CYGWIN__
#endif
// END cygwin nastyness.

#include <boost/array.hpp>
#include <boost/asio.hpp>
#include <iostream>

typedef boost::asio::ip::tcp::iostream networkStream;

void writeTestingData(networkStream* out) {
        *out << "Hello world." << std::flush;
//      *out << (char) 0x1 << (char) 0x2 << (char) 0x3 << std::flush;
}

int main() {
        networkStream out("192.168.1.1", "502");

        assert(out.good());

        writeTestingData(&out);
        out.close();
}

To add to the strange issue, if I send the string "Hello world.", it goes in one packet. If I send 0x1, 0x2, 0x3 (the raw byte values), I get 0x1 in packet 1, then the rest of the data in the next TCP packet. I am using wireshark to look at the packets, there is only a switch between the dev machine and 192.168.1.1.

like image 517
xconspirisist Avatar asked Jul 27 '11 15:07

xconspirisist


People also ask

How does TCP split data into packets?

TCP divides the data received from the application layer into segments and attaches a header to each segment. Segment headers contain sender and recipient ports, segment ordering information, and a data field known as a checksum.

Can UDP packets be split?

UDP is a datagram service. Datagrams may be split for transport, but they will be reassembled before being passed up to the application layer.


4 Answers

Don't worry, you are from from the only one to have this problem. There is definitely a solution. In fact, you have TWO problems with your legacy protocol and not only one.

Your old legacy protocol requires one "application message" to fit in "one and only one TCP packet" (because it incorrectly use a TCP stream-oriented protocol as a packet-oriented protocol). So we must make sure that :

  1. no "application message" is split across multiple TCP packets (the problem you are seeing)
  2. no TCP packet contains more than one "application message" (you are not seeing this but it may definitely happen)

The solution :

problem 1

You must feed your socket with all your "message" data at once. This is currently not happening because, as other people have outlined it, the boost stream API you use put data into the socket in separated calls when you use successive "<<" and the underlying TCP/IP stack of your OS doesn't buffer it enough (and with reasons, for better performance)

Multiple solutions :

  • you pass a char buffer instead of separate chars so that you make only one call to <<
  • you forget about boost, open an OS socket and feed it in one call to send() (on windows, look for the "winsock2" API, or look for "sys/socket.h" on unix/cygwin)

problem 2

You MUST activate the TCP_NODELAY option on your socket. This option is especially made for such legacy protocol cases. It will ensure that the OS TCP/IP stack send your data "without delay" and doesn't buffer it together with another application message you may send later.

  • if you stick with Boost, look for the TCP_NODELAY option, it is in the doc
  • if you use OS sockets, you'll have to use the setsockopt() function on your socket.

Conclusion

If you solve those two problems, you should be fine !

The OS socket API, either on windows or linux, is a bit tricky to use, but you'll gain full control about its behaviour. Unix example

like image 178
Offirmo Avatar answered Oct 11 '22 14:10

Offirmo


Your code:

out << (char) 0x1 << (char) 0x2 << (char) 0x3;

Will make 3 calls of operator<< function.

Because of Nagle's algorithm of TCP, TCP stack will send available data ((char)0x1) to peer immediately after/during the first operator<< call. So the rest of the data (0x2 and 0x3) will go to the next packet.

Solution for avoiding 1 byte TCP segments: Call sending functions with bigger bunch of data.

like image 24
SKi Avatar answered Oct 11 '22 12:10

SKi


I am not sure who would have imposed such a thing as having a requirement that an entire payload be within one TCP packet. TCP by its nature is a streamed protocol and much of the details in number of packets sent and payload size etc. are left up to the TCP stack implementation of the operating system.

I would double check to see if this is an actual restriction of your protocol or not.

like image 32
feathj Avatar answered Oct 11 '22 13:10

feathj


I agree with User1's answer. You probably invoke operator << several times; on the first invocation it immediately sends the first byte over the network, then the Nagle's algorithm comes into play, hence the remaining data is sent within a single packet.

Nevertheless, even if the packetization was not an issue, the even fact that you invoke a socket sending function frequently on small pieces of data is a big problem. Every function called on a socket invokes a heavy kernel-mode transaction (system call), calling send for every byte is simply insane!

You should first format your message in the memory, and then send it. For your design I'd suggest creating a sort of a cache stream, that would accumulate the data in its internal buffer and send it at once to the underlying stream.

like image 30
valdo Avatar answered Oct 11 '22 14:10

valdo