Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to parse a log file and find stacktraces

Tags:

java

regex

I'm working with a legacy Java app that has no logging and just prints all information to the console. Most exceptions are also "handled" by just doing a printStackTrace() call.

In a nutshell, I've just redirected the System.out and System.error streams to a log file, and now I need to parse that log file. So far all good, but I'm having problems trying to parse the log file for stack traces.

Some of the code is obscufated as well, so I need to run the stacktraces through a utility app to de-obscufate them. I'm trying to automate all of this.

The closest I've come so far is to get the initial Exception line using this:

.+Exception[^\n]+

And finding the "at ..(..)" lines using:

(\t+\Qat \E.+\s+)+

But I can't figure out how to put them together to get the full stacktrace.

Basically, the log files looks something like the following. There is no fixed structure and the lines before and after stack traces are completely random:

Modem ERROR (AT
Owner: CoreTalk
) - TIMEOUT
IN []
Try Open: COM3


javax.comm.PortInUseException: Port currently owned by CoreTalk
    at javax.comm.CommPortIdentifier.open(CommPortIdentifier.java:337)
...
    at UniPort.modemService.run(modemService.java:103)
Handling file: C:\Program Files\BackBone Technologies\CoreTalk 2006\InputXML\notify
java.io.FileNotFoundException: C:\Program Files\BackBone Technologies\CoreTalk 2006\InputXML\notify (The system cannot find the file specified)
    at java.io.FileInputStream.open(Native Method)
...
    at com.gobackbone.Store.a.a.handle(Unknown Source)
    at com.jniwrapper.win32.io.FileSystemWatcher.fireFileSystemEvent(FileSystemWatcher.java:223)
...
    at java.lang.Thread.run(Unknown Source)
Load Additional Ports
... Lots of random stuff
IN []

[Fatal Error] .xml:6:114: The entity name must immediately follow the '&' in the entity reference.
org.xml.sax.SAXParseException: The entity name must immediately follow the '&' in the entity reference.
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
...
    at com.gobackbone.Store.a.a.run(Unknown Source)
like image 568
Riaan Cornelius Avatar asked Sep 28 '10 15:09

Riaan Cornelius


2 Answers

Looks like you just need to paste them together (and use a newline as glue):

.+Exception[^\n]+\n(\t+\Qat \E.+\s+)+

But I would change your regex a bit:

^.+Exception[^\n]++(\s+at .++)+

This combines the whitespace between the at... lines and uses possessive quantifiers to avoid backtracking.

like image 152
Tim Pietzcker Avatar answered Nov 08 '22 13:11

Tim Pietzcker


We have been using ANTLR to tackle the parsing of logfiles (in a different application area). It's not trivial but if this is a critical task for you it will be better than using regexes.

like image 40
peter.murray.rust Avatar answered Nov 08 '22 13:11

peter.murray.rust