My company's proprietary software generates a log file that is much easier to use if it is parsed. The log parser we all use was written by another employee as a side project, and it has horrible performance.
These log files can grow to 10s of megabytes very quickly, and the parser we currently use has issues if a log file is bigger than 1 megabyte.
So, I want to write a program that can parse this massive amount of text in the shortest amount of time possible. We use Windows exclusively, so running on Windows is a must. Our current implementation runs on a local web server, and I'm convinced that running it as an application would have to be faster.
All suggestions will be helpful. Thanks.
EDIT: My ultimate goal is to parse the text and display it in a much more user friendly manner with colors and such. Can you do this with Perl and Python? I know you can do this with Java and C++. So, it will function like Notepad where you open a log file, but on the screen you display the user-friendly format instead of the raw file.
EDIT: So, I cant choose the best answer, and that was to choose a language that can best display what I'm going for, and then write the parser in that. Also, using ANTLR will probably make this process much easier. I changed the original question, since I guess I didn't ask what I was really looking for. Thanks everyone!
Parsing text using string methodsPython is incredible when it comes to dealing with strings. It is worth internalising all the common string operations. We can use these methods to extract data from a string as you can see in the simple example below.
The basic workflow of a parser generator tool is quite simple: you write a grammar that defines the language, or document, and you run the tool to generate a parser usable from your Python code.
The parser module provides an interface to Python's internal parser and byte-code compiler. The primary purpose for this interface is to allow Python code to edit the parse tree of a Python expression and create executable code from this.
Hmmm, "go with what you know" was a good answer. Perl was designed for this sort of thing (but imo is well suited for simple parsing, but I'd personally avoid it for complex projects).
If it gets even a little complex, why not use a proper syntax and grammar set-up?
Lex & Yacc (or Flex & Bison) spring to mind, but personally I would always reach for Antlr
Define various "words" in terms of patterns (syntax), and rules to combine those words (grammar) and Antlr will spit out a program to parse your input (you can have the program in Java, C, C++ and more (you are worried about parse time, so choose a compiled language, of course)).
I personally find it tedious to hand-craft parsers, and even more tedious to debug them, but AntlrWorks is a lovely IDE which really makes it a piece of cake ...
That bit at the bottom is defining a grammar rule.
If you mess up your grammar rules, you will be informed. This is not the case with hand-crafted parsers, where you just scratch your body part
and wonder about the "strange results"...
Check it out. Even if you think your project is trivial now, it may well grow. And if you have any interest in parsing you do owe it to yourself to at least be familiar with lex/yacc, but especially Antlr(Works)
You should use the language that YOU know... Unless you have so much time available to complete the project that you can also spend the time learning a new language.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With