Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python parsing log file for IP address and Protocol

this is my first question asked here at stackoverflow and am really looking forward to being part of this community. I am new to program and python was the most recommended first program by many people.

Anyways. I have a log file which looks like this:

"No.","Time","Source","Destination","Protocol","Info"
"1","0.000000","120.107.103.180","172.16.112.50","TELNET","Telnet Data ..." 
"2","0.000426","172.16.112.50","172.16.113.168","TELNET","Telnet Data ..." 
"3","0.019849","172.16.113.168","172.16.112.50","TCP","21582 > telnet [ACK]" 
"4","0.530125","172.16.113.168","172.16.112.50","TELNET","Telnet Data ..." 
"5","0.530634","172.16.112.50","172.16.113.168","TELNET","Telnet Data ..."

And I wanted to parse the log file using Python to make it look like this as the result:

From IP 135.13.216.191 Protocol Count: (IMF 1) (SMTP 38) (TCP 24) (Total: 63)

I would really like some help on what path to take to tackle this problem should I use lists and loop through it or dictionaries/tuples?

Thanks in advance for your help!

like image 757
John Smith Avatar asked Oct 24 '12 20:10

John Smith


2 Answers

You can parse the file using the csv module:

import csv

with open('logfile.txt') as logfile:
     for row in csv.reader(logfile):
         no, time, source, dest, protocol, info = row
         # do stuff with these

I can't quite tell what you're asking, but I think you want:

import csv
from collections import defaultdict

# A dictionary whose values are by default (a
# dictionary whose values are by default 0)
bySource = defaultdict(lambda: defaultdict(lambda: 0))

with open('logfile.txt') as logfile:
     for row in csv.DictReader(logfile):
         bySource[row["Source"]][row["Protocol"]] += 1

for source, protocols in bySource.iteritems():
    protocols['Total'] = sum(protocols.values())

    print "From IP %s Protocol Count: %s" % (
        source,
        ' '.join("(%s: %d)" % item for item in protocols.iteritems())
    )
like image 165
Eric Avatar answered Oct 11 '22 16:10

Eric


I would begin by first reading the file into a list:

contents = []
with open("file_path") as f:
    contents = f.readlines()

Then you can split each line into a list of it's own:

ips = [l[1:-1].split('","') for l in contents]

We can then map these into a dict:

sourceIps = {}
for ip in ips:
    try:
       sourceIps[ip[2]].append(ip)
    except:
       sourceIps[ip[2]] = [ip]

And finally print out the result:

for ip, stuff in sourceIps.iteritems():
   print "From {0} ... ".format(ip, ...)
like image 39
Will Avatar answered Oct 11 '22 17:10

Will