Could anyone explain if additional synchronization, e.g., locking, is needed in the following two situations in a Linux network driver? I am interested in the kernel 2.6.32 and newer.
In a driver for a PCI network card, the net_device
instance is usually registered in .probe()
callback. Suppose a driver specifies .ndo_open
callback in the net_device_ops
, performs other necessary operations and then calls register_netdev()
.
Is it possible for that .ndo_open
callback to be called by the kernel after register_netdev()
but before the end of .probe
callback? I suppose it is, but may be, there is a stronger guarantee, something that ensures that the device can be opened no earlier than .probe
ends?
In other words, if .probe
callback accesses, say, the private part of the net_device struct after register_netdev()
and ndo_open
callback accesses that part too, do I need to use locks or other means to synchronize these accesses?
Is there any guarantee that, for a given network device, .ndo_start_xmit
callback and NAPI poll
callback provided by a driver never execute concurrently?
I know that .ndo_start_xmit
is executed with BH disabled at least and poll
runs in the softirq, and hence, BH context. But this serializes execution of these callbacks on the local CPU only. Is it possible for .ndo_start_xmit
and poll
for the same network device to execute simultaneously on different CPUs?
As above, if these callbacks access the same data, is it needed to protect the data with a lock or something?
References to the kernel code and/or the docs are appreciated.
EDIT:
To check the first situation, I conducted an experiment and added a 1-minute delay right before the end of the call to register_netdev()
in e1000 driver (kernel: 3.11-rc1). I also added debug prints there in .probe
and .ndo_open
callbacks. Then I loaded e1000.ko, and tried to access the network device it services before the delay ended (in fact, NetworkManager did that before me), then checked the system log.
Result: yes, it is possible for .ndo_open
to be called even before the end of .probe
although the "race window" is usually rather small.
The second situation (.ndo_start_xmit
VS NAPI poll
) is still unclear to me and any help is appreciated.
Wrt the ".ndo_start_xmit VS NAPI poll" qs, well here's how am thinking: the start-xmit method of a network driver is invoked in NET_TX_SOFTIRQ context - it is in a softirq ctx itself. So is the NAPI receive poll method, but of course in the NET_RX_SOFTIRQ context.
Now the two softirq's will lock each other out - not race - on any local core. But by design intent, softirq's can certainly run in parallel on SMP; thus, who is to say that these two methods, the ".ndo_start_xmit VS NAPI poll", running in two separate softirq context's, will not ever race? IOW, I guess it could happen. Be safe, use spinlocks to protect global data.
Also, with modern TCP offload techniques becoming more prevalent, GSO is/could also be invoked at any point. HTH!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With