Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C#/.NET Lexer Generators

I'm looking for a decent lexical scanner generator for C#/.NET -- something that supports Unicode character categories, and generates somewhat readable & efficient code. Anyone know of one?


EDIT: I need support for Unicode categories, not just Unicode characters. There are currently 1421 characters in just the Lu (Letter, Uppercase) category alone, and I need to match many different categories very specifically, and would rather not hand-write the character sets necessary for it.

Also, actual code is a must -- this rules out things that generate a binary file that is then used with a driver (i.e. GOLD)


EDIT: ANTLR does not support Unicode categories yet. There is an open issue for it, though, so it might fit my needs someday.

like image 200
Alex Lyman Avatar asked Oct 05 '08 16:10

Alex Lyman


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

What is C in C language?

What is C? C is a general-purpose programming language created by Dennis Ritchie at the Bell Laboratories in 1972. It is a very popular language, despite being old. C is strongly associated with UNIX, as it was developed to write the UNIX operating system.

Is C language easy?

Compared to other languages—like Java, PHP, or C#—C is a relatively simple language to learn for anyone just starting to learn computer programming because of its limited number of keywords.

Why is C named so?

Because a and b and c , so it's name is C. C came out of Ken Thompson's Unix project at AT&T. He originally wrote Unix in assembly language. He wrote a language in assembly called B that ran on Unix, and was a subset of an existing language called BCPL.


3 Answers

GPLEX seems to support your requirements.

like image 113
leppie Avatar answered Sep 30 '22 09:09

leppie


The two solutions that come to mind are ANTLR and Gold. ANTLR has a GUI based grammar designer, and an excellent sample project in C# can be found here.

like image 22
David Robbins Avatar answered Sep 30 '22 09:09

David Robbins


I agree with @David Robbins, ANTLR is probably your best bet. However, the generated ANTLR code does need a seperate runtime library in order to use the generated code because there are some string parsing and other library commonalities that the generated code relies on. ANTLR generates a lexer AND a parser.

On a side note: ANTLR is great...I wrote a 400+ line grammar to generate over 10k or C# code to efficiently parse a language. This included built in error checking for every possible thing that could go wrong in the parsing of the language. Try to do that by hand, and you'll never keep up with the bugs.

like image 32
casademora Avatar answered Sep 30 '22 07:09

casademora