Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What was Tim Sweeney thinking? (How does this C++ parser work?)

Tags:

Tim Sweeney of Epic MegaGames is the lead developer for Unreal and a programming language geek. Many years ago posted the following screen shot to VoodooExtreme:

Tim Sweeney's screenshot

As a C++ programmer and Sweeney fan, I was captivated by this. It shows generic C++ code that implements some kind of scripting language where that language itself seems to be generic in the sense that it can define its own grammar.

Mr. Sweeney never explained himself. :-)

It's rare to see this level of template programming, but you do see it from time to time when people want to push the compiler to generate great code or because they want to create generic code (for example, Modern C++ Design).

Tim seems to be using it to create a grammar in Parser.cpp - you can see what look like prioritized binary operators. If that is the case, then why does Test.ae look like it's also defining a grammar?

Obviously this is a puzzle that needs to be solved. Victory goes to the answer with a working version of this code, or the most plausible explanation, or to Tim Sweeney himself if he posts an answer. :-)

like image 538
Frank Krueger Avatar asked May 07 '10 21:05

Frank Krueger


Video Answer


2 Answers

I asked Mr. Sweeney via email and received this answer:

The C++ code is using template classes I wrote that implement parser combinators. The idea is to start with some basic parsers, such as literals where PLit<'A'> parses a literal character 'A', PAny<> parses any character but fails if at the end of fail, PEof fails unless we're not at the end of the file, etc. And then we combine them into arbitrary trees using combinators like PAnd which parses a then b, and succeeds only if both succeed -- else it fails with the parse point unmoved. And if it succeeds, the result is a struct containing two fields, one for the result of a, and one for the result of b. And so on.

The implementation is messy in C++ for a number of reasons, including that templates don't support arbitrary variadic parameters, and without first-class lambdas we can't easily process the results inline with the parser.

Here are some snippets of the template code, from which you can probably figure out the details of the framework.

// Parses a literal item. UBOOL LiteralEvaluate (UClass* C,class FParseInBase& In,class FParseOutBase& Out) {  FParseInMark M(In);  NNode* e = In.GetNextSource();  if (e && e->IsA(C))  {   Out.Callback(e);   return 1;  }  M.Restore(In);  return 0; } // Optional item of the specified type. template <class U> struct Optional {  static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)  {   U::Evaluate(In,Out);   return 1;  } }; // Ignore items by absorbing them; retains boolean logic but not callback. template <class T> struct Ignore {  static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)  {   return T::Evaluate(In,GIgnore);  } }; // Zero or more items of the specified type. template <class T> struct ZeroOrMore {  static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)  {   while (T::Evaluate(In,Out));   return 1;  } }; // One or more items of the specified type. template <class T> struct OneOrMore {  static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)  {   for( INT i=0; T::Evaluate(In,Out); i++ );   return i>0;  } }; // Always fails. struct RFalse {  static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)  {   return 0;  } }; // Always succeeds. struct RTrue {  static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)  {   return 1;  } }; // Parses the first matching items of the specified subtypes of T. template <class A,class B=RFalse,class C=RFalse,class D=RFalse > struct Or {  static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)  {   return A::Evaluate(In,Out) || B::Evaluate(In,Out) || C::Evaluate(In,Out) || D::Evaluate(In,Out);  } }; // Parses all the specified items. template <class A,class B=RTrue,class C=RTrue,class D=RTrue> struct And {  static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)  {   FParseInMark Mark(In);   Conjunction<NNode> Q;   if( A::Evaluate(In,Q) && B::Evaluate(In,Q) && C::Evaluate(In,Q) && D::Evaluate(In,Q) )   {    Q.Forward(Out);    return 1;   }   Mark.Restore(In);   return 0;  } }; // A separated list. template <class A,class B> class SeparatedList : public Or<And<A,B,SeparatedList>,A> {}; // Integer comparison. template <INT A,INT B> struct IsAtLeast {  static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)  {   return A>=B;  } }; 

That Test.ae was an experimental scripting language I was implementing in 1999-2001 -- that color scheme was trendy back then, I swear. :-)

The code shown defines metadata for the language constructs. The language went a long way down the Smalltalk "everything is an object" path, dealing with first-class metaclasses and related issues, but I ultimately abandoned it when I became familiar with advanced type systems work in Haskell, Cayenne, Coq, and other languages.

Nowadays -

I'm not a fan of implementing parsers or compilers in C++, since the code tends to be bloated by 70-80% compared to a similar implementation in a modern functional language like Haskell. Read up on Haskell parser combinators for more detail -- the resulting simplicity and directness is exemplary and it's done in a rigorous, type-safe way.

like image 88
Frank Krueger Avatar answered Nov 02 '22 03:11

Frank Krueger


Can't tell for sure, but the C++ code kinda sorta looks like Spirit, a C++ parser generator that makes extensive use of templates. Test.ae looks like it's metaprogramming (defining language details in the language itself), which is harder to do in C++ (templates are a start, but is error prone and ugly) than it would be in some other target language (e.g., UnrealScript, which is what I assume test.ae is written in).

So - it looks like Parser.cpp defines the base grammar for UnrealScript (using Spirit), and Test.ae is defining extensions to UnrealScript.

like image 32
Eric Brown Avatar answered Nov 02 '22 03:11

Eric Brown