I'm writing a C/C++/... build system (I understand this is madness ;)), and I'm having trouble designing my parser.
My "recipes" look like this:
global
{
SOURCE_DIRS src
HEADER_DIRS include
SOURCES bitwise.c \
framing.c
HEADERS \
ogg/os_types.h \
ogg/ogg.h
}
lib static ogg_static
{
NAME ogg
}
lib shared ogg_shared
{
NAME ogg
}
(This being based on the super simple libogg source tree)
# are comments, \ are "newline escapes", meaning the line continues on the next line (see QMake syntac). {} are scopes, like in C++, and global are settings that apply to every "target". This is all background, and not that relevant... I really don't know how to work with my scopes. I will need to be able to have multiple scopes, and also a form of conditional processing, in the lines of:
win32:DEFINES NO_CRT_SECURE_DEPRECATE
The parsing function will need to know on what level of scope it's at, and call itself whenever the scope is increased. There is also the problem with the location of the braces ( global { or global{ or as in the example).
How could I go about this, using Standard C++ and STL? I understand this is a whole lot of work, and that's exactly why I need a good starting point. Thanks!
What I have already is the whole ifstream and internal string/stringstream storage, so I can read word per word.
I would suggest (and this is more or less right out of the compiler textbooks) that you approach the problem in phases. This breaks things down so that the problem is much more manageable in each phase.
Focus first on the lexer phase. Your lexing phase should take the raw text and give you a sequence of tokens, such as words and special characters. The lexer phase can take care of line continuations, and handle whitespace or comments as appropriate. By handling whitespace, the lexer can simplify your parser's task: you can write the lexer so that global{, global {, and even
global
{
will all yield two tokens: one representing global and one representing {.
Also note that the lexer can tack line and column numbers onto the tokens for use later if you hit errors.
Once you've got a nice stream of tokens flowing, work on your parsing phase. The parser should take that sequence of tokens and build an abstract syntax tree, which models the syntactic structures of your document. At this point, you shouldn't be worrying about ifstream and operator>>, since the lexer should have done all that reading for you.
You've indicated an interest in calling the parsing function recursively once you see a scope. That's certainly one way to go. As you'll see, the design decision you'll have to repeatedly make is whether you literally want to call the same parse function recursively
(allowing for constructions like global { global { ... } } which you may want to disallow syntactically), or whether you want to define a slightly (or even significantly) different set of syntax rules that apply inside a scope.
Once you find yourself having to vary the rules: the key is to reuse, by refactoring into functions, as much stuff as you can reuse between the different variants of syntax. If you keep heading in this direction – using separate functions that represent the different chunks of syntax you want to deal with and having them call each other (possibly recursively) where needed – you'll ultimately end up with what we call a recursive descent parser. The Wikipedia entry has got a good simple example of one; see http://en.wikipedia.org/wiki/Recursive_descent_parser .
If you find yourself really wanting to delve deeper into the theory and practice of lexers and parsers, I do recommend you get a good solid compiler textbook to help you out. The Stack Overflow topic mentioned in the comments above will get you started: Learning to write a compiler
boost::spirit is a good recursive descent parser generator that uses C++ templates as a language extension to describe parser and lexer. It works well for native C++ compilers, but won't compile under Managed C++.
Codeproject has a tutorial article that may help.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With