Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concurrency in the Linux network drivers: probe() VS ndo_open(), ndo_start_xmit() VS NAPI poll()

Could anyone explain if additional synchronization, e.g., locking, is needed in the following two situations in a Linux network driver? I am interested in the kernel 2.6.32 and newer.

1. .probe VS .ndo_open

In a driver for a PCI network card, the net_device instance is usually registered in .probe() callback. Suppose a driver specifies .ndo_open callback in the net_device_ops, performs other necessary operations and then calls register_netdev().

Is it possible for that .ndo_open callback to be called by the kernel after register_netdev() but before the end of .probe callback? I suppose it is, but may be, there is a stronger guarantee, something that ensures that the device can be opened no earlier than .probe ends?

In other words, if .probe callback accesses, say, the private part of the net_device struct after register_netdev() and ndo_open callback accesses that part too, do I need to use locks or other means to synchronize these accesses?

2. .ndo_start_xmit VS NAPI poll

Is there any guarantee that, for a given network device, .ndo_start_xmit callback and NAPI poll callback provided by a driver never execute concurrently?

I know that .ndo_start_xmit is executed with BH disabled at least and poll runs in the softirq, and hence, BH context. But this serializes execution of these callbacks on the local CPU only. Is it possible for .ndo_start_xmit and poll for the same network device to execute simultaneously on different CPUs?

As above, if these callbacks access the same data, is it needed to protect the data with a lock or something?

References to the kernel code and/or the docs are appreciated.

EDIT:

To check the first situation, I conducted an experiment and added a 1-minute delay right before the end of the call to register_netdev() in e1000 driver (kernel: 3.11-rc1). I also added debug prints there in .probe and .ndo_open callbacks. Then I loaded e1000.ko, and tried to access the network device it services before the delay ended (in fact, NetworkManager did that before me), then checked the system log.

Result: yes, it is possible for .ndo_open to be called even before the end of .probe although the "race window" is usually rather small.

The second situation (.ndo_start_xmit VS NAPI poll) is still unclear to me and any help is appreciated.

like image 414
Eugene Avatar asked Oct 04 '22 10:10

Eugene


1 Answers

Wrt the ".ndo_start_xmit VS NAPI poll" qs, well here's how am thinking: the start-xmit method of a network driver is invoked in NET_TX_SOFTIRQ context - it is in a softirq ctx itself. So is the NAPI receive poll method, but of course in the NET_RX_SOFTIRQ context.

Now the two softirq's will lock each other out - not race - on any local core. But by design intent, softirq's can certainly run in parallel on SMP; thus, who is to say that these two methods, the ".ndo_start_xmit VS NAPI poll", running in two separate softirq context's, will not ever race? IOW, I guess it could happen. Be safe, use spinlocks to protect global data.

Also, with modern TCP offload techniques becoming more prevalent, GSO is/could also be invoked at any point. HTH!

like image 53
kaiwan Avatar answered Oct 06 '22 00:10

kaiwan