People complain a lot about XML but, when compared to EDI and some of the proprietary file formats I've dealt with in my career, I think XML is bliss. The work I did on importing data files from Automotive Comparative Raters, each with it's own creative and nightmarish file format, still gives me nightmares.
That being said I'm curious how other programmers approach automated parsing of poorly formatted text files. Do you have a language preference? Are there any automation tools that you find invaluable? How do you make your code reusable?
A solution I learned about quite recently is using a standalone lexer. You get to use structured regular expressions and you avoid the constraints of a full blown parser generator.
Here are some examples with ocamllex (the lexer generator provided with OCaml):
Obviously lexer generators are also available in other languages if using OCaml is an issue for you.
Perl / Python, build up functionality slowly, keep the worse ones as test case, lots of coffee
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With