Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write a Parser in C#? [closed]

How do I go about writing a Parser (Recursive Descent?) in C#? For now I just want a simple parser that parses arithmetic expressions (and reads variables?). Though later I intend to write an xml and html parser (for learning purposes). I am doing this because of the wide range of stuff in which parsers are useful: Web development, Programming Language Interpreters, Inhouse Tools, Gaming Engines, Map and Tile Editors, etc. So what is the basic theory of writing parsers and how do I implement one in C#? Is C# the right language for parsers (I once wrote a simple arithmetic parser in C++ and it was efficient. Will JIT compilation prove equally good?). Any helpful resources and articles. And best of all, code examples (or links to code examples).

Note: Out of curiosity, has anyone answering this question ever implemented a parser in C#?

like image 527
ApprenticeHacker Avatar asked Sep 11 '11 09:09

ApprenticeHacker


People also ask

How hard is it to write a parser?

A handwritten parser: Writing a parser by hand is a moderately difficult task. Complexity may increase if the language-grammar is complex.

How do you write parser in C++?

My preferred way to make a C++ parser is to have Lex generate a plain C file, and to let YACC generate C++ code. When you then link your application, you may run into some problems because the C++ code by default won't be able to find C functions, unless you've told it that those functions are extern "C".

What is parsing in C?

To parse, in computer science, is where a string of commands – usually a program – is separated into more easily processed components, which are analyzed for correct syntax and then attached to tags that define each component. The computer can then process each program chunk and transform it into machine language.


2 Answers

I have implemented several parsers in C# - hand-written and tool generated.

A very good introductory tutorial on parsing in general is Let's Build a Compiler - it demonstrates how to build a recursive descent parser; and the concepts are easily translated from his language (I think it was Pascal) to C# for any competent developer. This will teach you how a recursive descent parser works, but it is completely impractical to write a full programming language parser by hand.

You should look into some tools to generate the code for you - if you are determined to write a classical recursive descent parser (TinyPG, Coco/R, Irony). Keep in mind that there are other ways to write parsers now, that usually perform better - and have easier definitions (e.g. TDOP parsing or Monadic Parsing).

On the topic of whether C# is up for the task - C# has some of the best text libraries out there. A lot of the parsers today (in other languages) have an obscene amount of code to deal with Unicode etc. I won't comment too much on JITted code because it can get quite religious - however you should be just fine. IronJS is a good example of a parser/runtime on the CLR (even though its written in F#) and its performance is just shy of Google V8.

Side Note: Markup parsers are completely different beasts when compared to language parsers - they are, in the majority of the cases, written by hand - and at the scanner/parser level very simple; they are not usually recursive descent - and especially in the case of XML it is better if you don't write a recursive descent parser (to avoid stack overflows, and because a 'flat' parser can be used in SAX/push mode).

like image 107
Jonathan Dickinson Avatar answered Sep 19 '22 18:09

Jonathan Dickinson


Sprache is a powerful yet lightweight framework for writing parsers in .NET. There is also a Sprache NuGet package. To give you an idea of the framework here is one of the samples that can parse a simple arithmetic expression into an .NET expression tree. Pretty amazing I would say.

using System; using System.Linq.Expressions; using Sprache;  namespace LinqyCalculator {     static class ExpressionParser     {         public static Expression<Func<decimal>> ParseExpression(string text)         {             return Lambda.Parse(text);         }          static Parser<ExpressionType> Operator(string op, ExpressionType opType)         {             return Parse.String(op).Token().Return(opType);         }          static readonly Parser<ExpressionType> Add = Operator("+", ExpressionType.AddChecked);         static readonly Parser<ExpressionType> Subtract = Operator("-", ExpressionType.SubtractChecked);         static readonly Parser<ExpressionType> Multiply = Operator("*", ExpressionType.MultiplyChecked);         static readonly Parser<ExpressionType> Divide = Operator("/", ExpressionType.Divide);          static readonly Parser<Expression> Constant =             (from d in Parse.Decimal.Token()              select (Expression)Expression.Constant(decimal.Parse(d))).Named("number");          static readonly Parser<Expression> Factor =             ((from lparen in Parse.Char('(')               from expr in Parse.Ref(() => Expr)               from rparen in Parse.Char(')')               select expr).Named("expression")              .XOr(Constant)).Token();          static readonly Parser<Expression> Term = Parse.ChainOperator(Multiply.Or(Divide), Factor, Expression.MakeBinary);          static readonly Parser<Expression> Expr = Parse.ChainOperator(Add.Or(Subtract), Term, Expression.MakeBinary);          static readonly Parser<Expression<Func<decimal>>> Lambda =             Expr.End().Select(body => Expression.Lambda<Func<decimal>>(body));     } } 
like image 36
Martin Liversage Avatar answered Sep 18 '22 18:09

Martin Liversage