Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate bandwidth usage per IP with scapy, iftop-style

I'm using scapy to sniff a mirror port and generate a list of the top 10 "talkers", i.e. a list of hosts using the most bandwidth on my network. I'm aware of tools already available such as iftop and ntop, but I need more control over the output.

The following script samples traffic for 30 seconds and then prints a list of the top 10 talkers in the format "source host -> destination host: bytes". That's great, but how can I calculate average bytes per second?

I got the sense that changing sample_interval down to 1 second doesn't allow for a good sampling of traffic, so it seems I need to average it out. So I tried this at the end of the script:

bytes per second = (total bytes / sample_interval)

but the resulting Bytes/s seems much lower. For example, I generated an rsync between two hosts at a throttled rate of 1.5 MB/s, but using the above average calculation, my script kept calculating the rate between these hosts as around 200 KB/s...much lower than 1.5 MB/s as I'd expect. I can confirm with iftop that 1.5 MB/s is in fact the rate between these two hosts.

Am I totaling up packet lengths incorrectly with scapy (see traffic_monitor_callbak function)? Or is this a poor solution altogether :)?

from scapy.all import *
from collections import defaultdict
import socket
from pprint import pprint
from operator import itemgetter

sample_interval = 30  # how long to capture traffic, in seconds

# initialize traffic dict
traffic = defaultdict(list)

# return human readable units given bytes
def human(num):
    for x in ['bytes','KB','MB','GB','TB']:
        if num < 1024.0:
            return "%3.1f %s" % (num, x)
        num /= 1024.0

# callback function to process each packet
# get total packets for each source->destination combo
def traffic_monitor_callbak(pkt):
    if IP in pkt:
        src = pkt.sprintf("%IP.src%")
        dst = pkt.sprintf("%IP.dst%")

        size = pkt.sprintf("%IP.len%")

        # initialize
        if (src, dst) not in traffic:
            traffic[(src, dst)] = 0

        else:
            traffic[(src, dst)] += int(size)

sniff(iface="eth1", prn=traffic_monitor_callbak, store=0, timeout=sample_interval)

# sort by total bytes, descending
traffic_sorted = sorted(traffic.iteritems(), key=itemgetter(1), reverse=True)    

# print top 10 talkers
for x in range(0, 10):
    src = traffic_sorted[x][0][0]
    dst = traffic_sorted[x][0][1]
    host_total = traffic_sorted[x][3]

    # get hostname from IP
    try:
        src_hostname = socket.gethostbyaddr(src)
    except:
        src_hostname = src

    try:    
        dst_hostname = socket.gethostbyaddr(dst)
    except:
        dst_hostname = dst


    print "%s: %s (%s) -> %s (%s)" % (human(host_total), src_hostname[0], src, dst_hostname[0], dst)

I'm not sure if this is a programming (scapy/python) question or more of a general networking question, so I'm calling it a network programming question.

like image 328
Banjer Avatar asked Jan 13 '14 15:01

Banjer


1 Answers

Hi,

First of all, you have a bug in the code you have posted: instead of host_total = traffic_sorted[x][3], you probably mean host_total = traffic_sorted[x][1].

Then, you have an error: you forget to divide host_total by the sample_interval value.

As you also want to add receiver-to-sender traffic and sender-to-receiver, I think the best way would be to use an "ordered" tuple (the order itself does not really matter here, lexicographical order might be fine, but you could also use the arithmetic order since IP addresses are 4 octets integers) as the key for the Counter object. This seems to work just fine:

#! /usr/bin/env python

sample_interval = 10
interface="eth1"

from scapy.all import *
from collections import Counter


# Counter is a *much* better option for what you're doing here. See
# http://docs.python.org/2/library/collections.html#collections.Counter
traffic = Counter()
# You should probably use a cache for your IP resolutions
hosts = {}

def human(num):
    for x in ['', 'k', 'M', 'G', 'T']:
        if num < 1024.: return "%3.1f %sB" % (num, x)
        num /= 1024.
    # just in case!
    return  "%3.1f PB" % (num)

def traffic_monitor_callback(pkt):
    if IP in pkt:
        pkt = pkt[IP]
        # You don't want to use sprintf here, particularly as you're
        # converting .len after that!
        # Here is the first place where you're happy to use a Counter!
        # We use a tuple(sorted()) because a tuple is hashable (so it
        # can be used as a key in a Counter) and we want to sort the
        # addresses to count mix sender-to-receiver traffic together
        # with receiver-to-sender
        traffic.update({tuple(sorted(map(atol, (pkt.src, pkt.dst)))): pkt.len})

sniff(iface=interface, prn=traffic_monitor_callback, store=False,
      timeout=sample_interval)

# ... and now comes the second place where you're happy to use a
# Counter!
# Plus you can use value unpacking in your for statement.
for (h1, h2), total in traffic.most_common(10):
    # Let's factor out some code here
    h1, h2 = map(ltoa, (h1, h2))
    for host in (h1, h2):
        if host not in hosts:
            try:
                rhost = socket.gethostbyaddr(host)
                hosts[host] = rhost[0]
            except:
                hosts[host] = None
    # Get a nice output
    h1 = "%s (%s)" % (hosts[h1], h1) if hosts[h1] is not None else h1
    h2 = "%s (%s)" % (hosts[h2], h2) if hosts[h2] is not None else h2
    print "%s/s: %s - %s" % (human(float(total)/sample_interval), h1, h2)

It is possible that Scapy is not fast enough to do the job. To be sure, you can, with e.g. tcpdump -w, capture your traffic to a file for sample_interval seconds, and then run (by the way, have a look at the way to apply the function to the packets, I think it's a good thing to know if you use Scapy often):

#! /usr/bin/env python

sample_interval = 10
filename="capture.cap"

from scapy.all import *
from collections import Counter

traffic = Counter()
hosts = {}

def human(num):
    for x in ['', 'k', 'M', 'G', 'T']:
        if num < 1024.: return "%3.1f %sB" % (num, x)
        num /= 1024.
    return  "%3.1f PB" % (num)

def traffic_monitor_callback(pkt):
    if IP in pkt:
        pkt = pkt[IP]
        traffic.update({tuple(sorted(map(atol, (pkt.src, pkt.dst)))): pkt.len})

# A trick I like: don't use rdpcap() that would waste your memory;
# iterate over a PcapReader object instead.
for p in PcapReader("capture.cap"):
    traffic_monitor_callback(p)

for (h1, h2), total in traffic.most_common(10):
    h1, h2 = map(ltoa, (h1, h2))
    for host in (h1, h2):
        if host not in hosts:
            try:
                rhost = socket.gethostbyaddr(host)
                hosts[host] = rhost[0]
            except:
                hosts[host] = None
    h1 = "%s (%s)" % (hosts[h1], h1) if hosts[h1] is not None else h1
    h2 = "%s (%s)" % (hosts[h2], h2) if hosts[h2] is not None else h2
    print "%s/s: %s - %s" % (human(float(total)/sample_interval), h1, h2)
like image 111
Pierre Avatar answered Oct 19 '22 20:10

Pierre