Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When should I use a parser?

Tags:

regex

parsing

I have had problems in Regexes to divide a code up into functional components. They can break or it can take a long time for them to finish. The experience raises a question:

"When should I use a parser?"

like image 244
Léo Léopold Hertz 준영 Avatar asked Apr 11 '09 12:04

Léo Léopold Hertz 준영


People also ask

Why do I need parser?

A parser is a compiler or interpreter component that breaks data into smaller elements for easy translation into another language. A parser takes input in the form of a sequence of tokens, interactive commands, or program instructions and breaks them up into parts that can be used by other components in programming.

Should I use a parser generator?

A parser generator is a good tool that you should make part of your toolbox. A parser generator takes a grammar as input and automatically generates source code that can parse streams of characters using the grammar.

Why parsing is important and where it is used?

Parser is used to report any syntax error. It helps to recover from commonly occurring error so that the processing of the remainder of program can be continued. Parse tree is created with the help of a parser. Parser is used to create symbol table, which plays an important role in NLP.


1 Answers

You should use a parser when you are interested in the lexical or semantic meaning of text, when patterns can vary. Parsers are generally overkill when you are simply looking to match or replace patterns of characters, regardless of their functional meaning.

In your case, you seem to be interested in the meaning behind the text ("functional components" of code), so a parser would be the better choice. Parsers can, however, internally make use of regex, so they should not be regarded as mutually exclusive.


A "parser" does not automatically mean it has to be complicated, however. For example, if you are interested in C code blocks, you could simply parse nested groups of { and }. This parser would only be interested in two tokens ('{' and '}') and the blocks of text between them.

However, a simple regex comparison is not sufficient here because of the nested semantics. Take the following code:

void Foo(bool Bar)
{
    if(Bar)
    {
        f();
    }
    else
    {
        g();
    }
}

A parser will understand the overall scope of Foo, as well as each inner scope contained within Foo (the if and else blocks). As it encounters each '{' token, it "understands" their meaning. A simple search, however does not understand the meaning behind the text and may interpret the following to be a block, which we of course know is not correct:

{
    if(Bar)
    {
        f();
    }
like image 71
lc. Avatar answered Sep 19 '22 15:09

lc.