Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does GNU/Flex C++ work at all?

Quoting the book flex & bison (O'Reilly, John Levine, 2009):

"Bison can create parsers in C++. Although flex appears to be able to create C++, scanners, the C++ code doesn't work.[21] Fortunately, C scanners created by flex compile under C++ and it is not hard to use a flex C scanner with a bison C++ parser". (Footnote [21]: "This is confirmed by the guy who wrote it. It will probably be fixed eventually, but it turned out to be surprisingly hard to design a good C++ interface for flex scanners.")

Before I commit the effort of writing a rather complex Flex scanner I (and I think many of us) would like to know if anything about this has changed since 2009. Is anyone out there successfully writing Flex/C++ parsers? If so, is it worth the effort or is a C scanner with a C++ parser still the safest course?

like image 419
user3513432 Avatar asked Apr 16 '16 16:04

user3513432


People also ask

Does Flex work with C++?

flex provides two different ways to generate scanners for use with C++. The first way is to simply compile a scanner generated by flex using a C++ compiler instead of a C compiler. You should not encounter any compilations errors (please report any you find to the email address given in the Author section below).

How does flex bison work?

Bison produces parser from the input file provided by the user. The function yylex() is automatically generated by the flex when it is provided with a . l file and this yylex() function is expected by parser to call to retrieve tokens from current/this token stream.

What is GNU bison used for?

Bison is a general-purpose parser generator that converts an annotated context-free grammar into a deterministic LR or generalized LR (GLR) parser employing LALR (1) parser tables. As an experimental feature, Bison can also generate IELR (1) or canonical LR(1) parser tables.


1 Answers

It's totally possible and it works great once set-up. Unfortunately documentation about pure C++ Flex/Bison lexer parser is not so easy to find and follow.

I can expose you a barebone of a parser I wrote but it's just an example of how you could do it.

Mind that some of this code has been set-up by trial and error, since documentation is scarce, so therey could be superfluous operations or things that are not exactly correct but it works.

ypp file

%skeleton "lalr1.cc"
%require "3.0.2"

%defines
%define api.namespace {script}
%define parser_class_name {Parser}

%define api.token.constructor
%define api.value.type variant
%define parse.assert true

%code requires {

  namespace script
  {
    class Compiler;
    class Lexer;
  }
}

%lex-param { script::Lexer &lexer }
%lex-param { script::Compiler &compiler }
%parse-param { script::Lexer &lexer }
%parse-param { script::Compiler &compiler }

%locations
%initial-action
{
  @$.begin.filename = @$.end.filename = &compiler.file;
};

%define parse.trace
%define parse.error verbose

%code top {
  #include "Compiler.h"
  #include "MyLexer.h"
  #include "MyParser.hpp"

  static script::Parser::symbol_type yylex(script::Lexer &scanner, script::Compiler &compiler) {
    return scanner.get_next_token();
  }

  using namespace script;
}

// tokens and grammar

void script::Parser::error(const location_type& l, const std::string& m)
{
  compiler.error(l,m);
}

Here you can use C++ everywhere, for example

%type<std::list<Statement*>> statement_list for_statement
...
statement_list:
  { $$ = std::list<Statement*>(); }
  | statement_list statement { $1.push_back($2); $$ = $1; }
;

l file

%{
  #include "MyParser.hpp"
  #include "MyLexer.h"
  #include "Compiler.h"
  #include <string>

  typedef script::Parser::token token;

  #define yyterminate() script::Parser::make_END(loc);

  static script::location loc;

  using namespace script;
%}

%x sstring
%x scomment

%option nodefault
%option noyywrap
%option c++
%option yyclass="Lexer"
%option prefix="My"


%{
  # define YY_USER_ACTION  loc.columns((int)yyleng);
%}


%%

%{
  loc.step();
%}

Then you'll need a header file which defines your Lexer class which will inherit from yyFlexLexer that is how C++ Flex works, which is something like

#if ! defined(yyFlexLexerOnce)
#undef yyFlexLexer
#define yyFlexLexer NanoFlexLexer
#include <FlexLexer.h>
#endif

#undef YY_DECL
#define YY_DECL script::Parser::symbol_type script::Lexer::get_next_token()

#include "MyParser.hpp"

namespace script
{
  class Compiler;

  class Lexer : public yyFlexLexer
  {
  public:

    Lexer(Compiler &compiler, std::istream *in) : yyFlexLexer(in), compiler(compiler) {}

    virtual script::Parser::symbol_type get_next_token();
    virtual ~Lexer() { }

  private:

    Compiler &compiler;
  };

}

The last step is defining your Compiler class which will get called from the Bison grammar rules (that's what parse-param attributes in ypp file are for). Something like:

#include "parser/MyParser.hpp"
#include "parser/MyLexer.h"
#include "parser/location.hh"

#include "Symbols.h"

namespace script
{
  class Compiler
  {

  public:
    Compiler();

    std::string file;

    void error(const location& l, const std::string& m);
    void error(const std::string& m);

    vm::Script* compile(const std::string& text);

    bool parseString(const std::string& text);

    void setRoot(ASTRoot* root);
    Node* getRoot() { return root.get(); }
  };
}

Now you can execute parsing easily and totally passing by C++ code, eg:

bool Compiler::parseString(const std::string &text)
{      
  constexpr bool shouldGenerateTrace = false;

  istringstream ss(text);

  script::Lexer lexer = script::Lexer(*this, &ss);
  script::Parser parser(lexer, *this);
  parser.set_debug_level(shouldGenerateTrace);
  return parser.parse() == 0;
}

The only thing you must take care is to invoke flex on the .l file with -c++ argument to make it produce a C++ lexer.

Actually with some careful operations I've been also able to have multiple independent and self-reentrant lexers/parsers in the same project.

like image 66
Jack Avatar answered Oct 22 '22 04:10

Jack