Having a problem with parsing Snort logs using the pyparsing module.
The problem is with separating the Snort log (which has multiline entries, separated by a blank line) and getting pyparsing to parse each entry as a whole chunk, rather than read in line by line and expecting the grammar to work with each line (obviously, it does not.)
I have tried converting each chunk to a temporary string, stripping out the newlines inside each chunk, but it refuses to process correctly. I may be wholly on the wrong track, but I don't think so (a similar form works perfectly for syslog-type logs, but those are one-line entries and so lend themselves to your basic file iterator / line processing)
Here's a sample of the log and the code I have so far:
[**] [1:486:4] ICMP Destination Unreachable Communication with Destination Host is Administratively Prohibited [**]
[Classification: Misc activity] [Priority: 3]
08/03-07:30:02.233350 172.143.241.86 -> 63.44.2.33
ICMP TTL:61 TOS:0xC0 ID:49461 IpLen:20 DgmLen:88
Type:3 Code:10 DESTINATION UNREACHABLE: ADMINISTRATIVELY PROHIBITED HOST FILTERED
** ORIGINAL DATAGRAM DUMP:
63.44.2.33:41235 -> 172.143.241.86:4949
TCP TTL:61 TOS:0x0 ID:36212 IpLen:20 DgmLen:60 DF
Seq: 0xF74E606
(32 more bytes of original packet)
** END OF DUMP
[**] ...more like this [**]
And the updated code:
def snort_parse(logfile):
header = Suppress("[**] [") + Combine(integer + ":" + integer + ":" + integer) + Suppress("]") + Regex(".*") + Suppress("[**]")
cls = Optional(Suppress("[Classification:") + Regex(".*") + Suppress("]"))
pri = Suppress("[Priority:") + integer + Suppress("]")
date = integer + "/" + integer + "-" + integer + ":" + integer + "." + Suppress(integer)
src_ip = ip_addr + Suppress("->")
dest_ip = ip_addr
extra = Regex(".*")
bnf = header + cls + pri + date + src_ip + dest_ip + extra
def logreader(logfile):
chunk = []
with open(logfile) as snort_logfile:
for line in snort_logfile:
if line !='\n':
line = line[:-1]
chunk.append(line)
continue
else:
print chunk
yield " ".join(chunk)
chunk = []
string_to_parse = "".join(logreader(logfile).next())
fields = bnf.parseString(string_to_parse)
print fields
Any help, pointers, RTFMs, You're Doing It Wrongs, etc., greatly appreciated.
import pyparsing as pyp
import itertools
integer = pyp.Word(pyp.nums)
ip_addr = pyp.Combine(integer+'.'+integer+'.'+integer+'.'+integer)
def snort_parse(logfile):
header = (pyp.Suppress("[**] [")
+ pyp.Combine(integer + ":" + integer + ":" + integer)
+ pyp.Suppress(pyp.SkipTo("[**]", include = True)))
cls = (
pyp.Suppress(pyp.Optional(pyp.Literal("[Classification:")))
+ pyp.Regex("[^]]*") + pyp.Suppress(']'))
pri = pyp.Suppress("[Priority:") + integer + pyp.Suppress("]")
date = pyp.Combine(
integer+"/"+integer+'-'+integer+':'+integer+':'+integer+'.'+integer)
src_ip = ip_addr + pyp.Suppress("->")
dest_ip = ip_addr
bnf = header+cls+pri+date+src_ip+dest_ip
with open(logfile) as snort_logfile:
for has_content, grp in itertools.groupby(
snort_logfile, key = lambda x: bool(x.strip())):
if has_content:
tmpStr = ''.join(grp)
fields = bnf.searchString(tmpStr)
print(fields)
snort_parse('snort_file')
yields
[['1:486:4', 'Misc activity', '3', '08/03-07:30:02.233350', '172.143.241.86', '63.44.2.33']]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With