Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can anybody explain what are the TSO/LRO hardware functions in TCP?

Tags:

networking

tcp

Can anybody explain what are the TSO/LRO hardware functions in TCP and if these functions also responsible to the Acknowledgment mechanism ?

like image 499
Amir Yosha Avatar asked Sep 11 '11 08:09

Amir Yosha


People also ask

What is TCP LRO?

Large Receive Offload (LRO), or Receive Side Coalescing (RSC), allows a network interface controller to combine incoming TCP/IP packets that belong to the same connection into one large receive segment before passing it to the operating system.

How does TCP offload work?

It works by passing a multipacket buffer to the network interface card (NIC). The NIC then splits this buffer into separate packets. The technique is also called TCP segmentation offload (TSO) or generic segmentation offload (GSO) when applied to TCP.

What is LRO in Linux?

LRO (Large Receive Offload) is supported by Mellanox hardware and drivers, and can be controlled using Ethtool. LRO on Old Kernels. LRO on New Kernels. Generic Receive Offload (GRO)

What is TSO in networking?

TCP segmentation offload (TSO), or Large send offload (LSO), reduces the CPU usage of the host system in high-bandwidth outbound network connections. In TSO, data segmentation is offloaded to the NIC that divides the data into the default maximum transmission unit (MTU) size of the outgoing interface.


2 Answers

I know this is an old thread, but I feel the answer isn't complete.

What you first have to understand is that TSO is the tip of a fairly big iceberg when it comes to network performance boosting techniques.

Let's consider the basic network interface. Your OS sends a whole packet to the NIC (network interface card) using PIO (Programmed input/output i.e. one word (normally 32 bits) at a time) as it should appear on the wire only excluding the frame check sequence.

These are the speed boosts for the transmission of data.

So the first speed boost is to use DMA (Direct Memory Access), this allows the processor to do other things while the hardware copies the packet. But the OS still has to copy the packet data into memory and generate the headers and checksums.

The second boost is to have the hardware generate the checksum for the data portion of the packet, the OS will still copy the data into its memory space and place the header before it. As the OS is generating the headers it may as well always generate the checksums for the headers. This seems complicated, but the mechanism is actually quite simple. The hardware is told to start checksumming when it reaches position XX and to place the checksum at position yy in the packet buffer.

The third boost is to use Scatter/Gather. This basically means the OS doesn't copy the data into its memory, it passes the header and the location of the data portion to the driver and allows the driver to collect the data to send it. This requires hardware checksumming, if the OS needs to checksum the packet then it needs to copy it into memory first.

The fourth (and highest level of natively supported boosting in Linux) is TSO. With TSO the OS gives the hardware a header template and then a large chunk of data (no more then 64K) for it to split and checksum, the means the OS needs to generate fewer headers and any overhead in setting up the DMA is also decimated. When the packets go on the wire they are compliant to the normal rules of packets and will be compatible with ANY switch or router they transit through.

Reception is a different story. Hardware checksumming is more of a guess than a certainty here, so what SHOULD happen is the hardware passes the packet and the checksum to the OS separately and allows the OS to decide if the packet is OK or not.

Scatter/Gather is pretty much redundant for receive.

LRO (Large receive offload), well, there's no easy way for the hardware to know what these packets mean, so LRO is currently a software only construct, the packets are passed to the OS, the OS then decides whether or not to concatenate the data and pass a large chunk to the application or to pass many smaller chunks.

A few notes on the network stack.

The software should ALWAYS produce the ACK packets. The only reason it wouldn't is if you had a TOE (TCP Offload Engine) on your NIC. I don't know of any OS which natively supports this, which means you'd need to hack it to make it compatible.

So there's a full and rambling response, hope it helps someone.

like image 146
Craig Avatar answered Sep 26 '22 14:09

Craig


A host with TSO-enabled hardware sends TCP data to the NIC without segmenting the data in software. The NIC will perform TCP segmentation (read - it will divide the large data chunk into segments). NICs supporting LRO receive packets and reassemble them before passing the data on to the local software.

LRO/TSO are not responsible to the ack mechanism directly (though it does rely on GBN). Note that LRO/TSO are safe to use on routers and bridges so long as all interfaces involved support the technique.

like image 23
Henry Aloni Avatar answered Sep 22 '22 14:09

Henry Aloni