I'm getting thousands of dropped packages from a Broadcom Network Card:
eth1 Link encap:Ethernet HWaddr 01:27:B0:14:DA:FE
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:2746252626 errors:0 dropped:1151734 overruns:0 frame:0
TX packets:4109502155 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:427998700000 (408171.3 Mb) TX bytes:3530782240047 (3367216.3 Mb)
Interrupt:40 Memory:d8000000-d8012700
Here is the installed version:
filename: /lib/modules/2.6.27.54-0.2-default/kernel/drivers/net/bnx2.ko
version: 1.8.0
license: GPL
description: Broadcom NetXtreme II BCM5706/5708/5709 Driver
The packets get dropped in bulks ranging from 500 to 5000 packets several times an hour. The Server (running Postgres) is running fine - just the dropps are annoying.
After trying lots of different things, I'm asking: How may I find out where the packets came from and why were they dropped?
A dropped packet means that the buffer that is used to store the packet for forwarding/processing is full. The act of looking into the packet's data for information implies that you have the data to look at in the first place (which you don't, because there was no room to store it).
A nice way around this, so you can see what data is being dropped, is to look through a dump of your traffic for the TCP retransmission requests leaving your server. When a TCP packet is missing, for whatever reason, your server is going to ask for it to be re-sent. The retransmit will give you the conversation context that you're looking for.
I'd actually suggest taking a look at the switch/router that your server is connected to. It will be able to give you a nice idea of the loss and throughput over the interface to your server, letting you diagnose, for example, if your card is too slow for the wire.
EDIT
This blog post cites a tool called dropwatch
, which may give you some clues as well.
You may ran into https://www.novell.com/support/kb/doc.php?id=7007165.
quote:
Beginning with kernel 2.6.37, it has been changed the meaning of dropped packet count. Before, dropped packets was most likely due to an error. Now, the rx_dropped counter shows statistics for dropped frames because of:
Softnet backlog full -- (Measured from /proc/net/softnet_stat)
Bad / Unintended VLAN tags
Unknown / Unregistered protocols
IPv6 frames when the server is not configured for IPv6
If any frames meet those conditions, they are dropped before the protocol stack and the rx_dropped counter is incremented.
(For the benefit of those that come to this via a search) I've seen the same problem (also with a bnx2 module, IIRC).
You might try turning off the irqbalance service. In my case, it completely stopped the solution.
Please also note that not so long ago, there were plenty of updates (RHEL 6) for irqbalance. Firmware updates should also be checked for both main system and the ethernet board(s).
We were seeing this only a very large subnet with a very large amount of broadcast/multicast activity. We weren't seeing this on the same equipment on a less noisy -- but still very active -- part of the network.
Potentially, setting the ethernet ring buffer size for the NIC can also be of use. I know there were some alterations for sysctl on that busy network...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With