Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Building a Generic Parser for Converting a Text File to a Data Structure in C#

I have a definition for a SPAN file (http://www.cme-ch.com/span/spanl300.htm) that i'd like to use in constructing a parser to parse the string data into an in memory collection class (or even using lazy evalution with the yield keyword.)

All parsing techniques and libraries i've seen apply to constructing parse trees for implementing languages; i'd simply like to know of any good techniques to parse into a data structure, similar to how XML is parsed into an XMLDocument in the .net framework, but using the rules defined by SPAN.

like image 679
Pierreten Avatar asked Aug 18 '10 19:08

Pierreten


People also ask

What is Generic parser?

generic-parser is a parser library for arguments and config-files that also allows direct Python input, used by the OMC Team at CERN.

What is to parse a file?

Updated: 12/29/2017 by Computer Hope. To parse data or information means to break it down into component parts so that its syntax can be analyzed, categorized, and understood.

What is parsing format?

Parsing is the process of converting formatted text into a data structure. A data structure type can be any suitable representation of the information engraved in the source text. Tree type is a common and standard choice for XML parsing, HTML parsing, JSON parsing, and any programming language parsing.


1 Answers

SPAN appears to be a bunch of record types, each record with a lot of detail.

It should be straightforward to define a classic grammar that covers all of the records (as nonterminals), in terms of any subrecords (as nonterminals) and terminal data types representing the various datatypes defined by SPAN. There might be a lot of nonterminals, but that just makes for a big grammar, but not a complicated one.

Most programming languages have a small set of terminal tokens that can generally appear anywhere. The truth is that grammars define expectations of what can appear next (called "first" and "follow" sets in the LR parser literature), including a very limited set of terminals. A SPAN grammar would not be different; each "parse state" of a parser implies a limited set of terminals that come next, and one organize a parser to take advantage of this. (I've built L(AL)R parsers, and one could easily use the "current" state to determine the subset of terminals that could happen next). So, a SPAN parser could determine just the small set of tokens that might occur next in each state, and use that to pick of the characaters comprising those next tokens (they must form disjoint sets!).

An easy way to implement this is with a recursive descent parser.

So I claim that all that parsing machinery would be just fine for parsing SPAN, with some bit of custom work possibly to pick up the tokens.

Parsing actions for conventional parsers build trees, but its just as easy to populate fields of a data structure.

like image 66
Ira Baxter Avatar answered Sep 28 '22 10:09

Ira Baxter