I've been given a job of 'translating' one language into another. The source is too flexible (complex) for a simple line by line approach with regex. Where can I go to learn more about lexical analysis and parsers?
If you know exactly what language you are going to parse, writing a hand-written parser is straightforward (although laborious). If you don't know the language, then refactoring parsers can be quite difficult.
Parsing is a frequently used term both in the realm of data quality, and in computing in general. It can mean anything from simply 'breaking up data' to full Natural Language Parsing (NLP), which uses sophisticated artificial intelligence to allow computers to 'understand' human language.
Traditionally, parsing is done by taking a sentence and breaking it down into different parts of speech. The words are placed into distinct grammatical categories, and then the grammatical relationships between the words are identified, allowing the reader to interpret the sentence.
Depending upon how the parse tree is built, parsing techniques are classified into three general categories, namely, universal parsing, top-down parsing, and bottom-up parsing. The most commonly used parsing techniques are top-down parsing and bottom-up parsing.
If you want to get "emotional" about the subject, pick up a copy of "The Dragon Book." It is usually the text in a compiler design course. It will definitely meet your need "learn more about lexical analysis and parsers" as well as a bunch of other fun stuff!
IMH(umble)O, save yourself an arm and/or leg and buy an older edition - it will fill your information desires.
Try ANLTR:
ANTLR, ANother Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages.
There's a book for it also.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With