What happens after a packet is captured?

Tags:

I've been reading about what happens after packets are captured by NICs, and the more I read, the more I'm confused.

Firstly, I've read that traditionally, after a packet is captured by the NIC, it gets copied to a block of memory in the kernel space, then to the user space for whatever application that then works on the packet data. Then I read about DMA, where the NIC directly copies the packet into memory, bypassing the CPU. So is the NIC -> kernel memory -> User space memory flow still valid? Also, do most NIC (e.g. Myricom) use DMA to improve packet capture rates?

Secondly, does RSS (Receive Side Scaling) work similarly in both Windows and Linux systems? I can only find detailed explanations on how RSS works in MSDN articles, where they talk about how RSS (and MSI-X) works on Windows Server 2008. But the same concept of RSS and MSI-X should still apply for linux systems, right?

Thank you.

Regards, Rayne

483

asked Mar 30 '10 03:03

Rayne

1 Answers

How this process plays out is mostly up to the driver author and the hardware, but for the drivers I've looked at or written and the hardware I've worked with, this is usually the way it works:

At driver initialization, it will allocate some number of buffers and give these to the NIC.
When a packet is received by the NIC, it pulls the next address off its list of buffers, DMAs the data directly into it, and notifies the driver via an interrupt.
The driver gets the interrupt, and can either turn the buffer over to the kernel or it will allocate a new kernel buffer and copy the data. "Zero copy networking" is the former and obviously requires support from the operating system. (more below on this)
The driver needs to either allocate a new buffer (in the zero-copy case) or it will re-use the buffer. In either case, the buffer is given back to the NIC for future packets.

Zero-copy networking within the kernel isn't so bad. Zero-copy all the way down to userland is much harder. Userland gets data, but network packets are made up of both header and data. At the least, true zero-copy all the way to userland requires support from your NIC so that it can DMA packets into separate header/data buffers. The headers are recycled once the kernel routes the packet to its destination and verifies the checksum (for TCP, either in hardware if the NIC supports it or in software if not; note that if the kernel has to compute the checksum itself, it'd may as well copy the data, too: looking at the data incurs cache misses and copying it elsewhere can be for free with tuned code).

Even assuming all the stars align, the data isn't actually in your user buffer when it is received by the system. Until an application asks for the data, the kernel doesn't know where it will end up. Consider the case of a multi-process daemon like Apache. There are many child processes, all listening on the same socket. You can also establish a connection, fork(), and both processes are able to recv() incoming data.

TCP packets on the Internet are usually 1460 bytes of payload (MTU of 1500 = 20 byte IP header + 20 byte TCP header + 1460 bytes data). 1460 is not a power of 2 and won't match a page size on any system you'll find. This presents problems for reassembly of the data stream. Remember that TCP is stream-oriented. There is no distinction between sender writes, and two 1000 byte writes waiting at the received will be consumed entirely in a 2000 byte read.

Taking this further, consider the user buffers. These are allocated by the application. In order to be used for zero-copy all the way down, the buffer needs to be page-aligned and not share that memory page with anything else. At recv() time, the kernel could theoretically remap the old page with the one containing the data and "flip" it into place, but this is complicated by the reassembly issue above since successive packets will be on separate pages. The kernel could limit the data it hands back to each packet's payload, but this will mean a lot of additional system calls, page remapping and likely lower throughput overall.

I'm really only scratching the surface on this topic. I worked at a couple of companies in the early 2000s trying to extend the zero-copy concepts down into userland. We even implemented a TCP stack in userland and circumvented the kernel entirely for applications using the stack, but that brought its own set of problems and was never production quality. It's a very hard problem to solve.

134

answered Oct 15 '22 12:10

Steve Madsen

Related questions
                            
                                how to replace "/" in a POSIX sh string
                            
                                Installing erlang from tar resulting in errors, wondering how to specify folders
                            
                                Tomcat 7 with Java 8 on Windows and Linux
                            
                                local_policy.jar and US_export_policy.jar different with Unlimited Strength Vs Default.
                            
                                BASH : Difference between '-' and '--' options
                            
                                Why do so many projects prepend "v" to the git version tags?
                            
                                IS_ERR() macro in Linux
                            
                                Returning output from bash script to calling C++ function
                            
                                Does node.js --max-old-space-size include forked processes?
                            
                                Filter options for sniff function in scapy
                            
                                Eclipse crashing frequently
                            
                                Why does the sys_read system call end when it detects a new line?
                            
                                Restore SQL Server database to Linux Docker
                            
                                How to store command output into array in Ansible?
                            
                                Why does a stack overflow occur at varying stack usage each run instead of a fixed amount?
                            
                                Asyncio in corroutine RuntimeError: no running event loop
                            
                                Running Docker on Google Colab
                            
                                R Backports and Invalid Signature
                            
                                What is the best tool to convert common video formats to FLV on a Linux CLI [closed]
                            
                                Linux assembler error "impossible constraint in ‘asm’"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What happens after a packet is captured?

Tags:

linux

rss

windows-installer

packet

dma

Rayne

People also ask

1 Answers

Steve Madsen

Recent Activity

Donate For Us