Yes. That's right. I want to be able to paste an expression like:
"a && b || c"
directly into source code as a string:
const std::string expression_text("a && b || c");
Create a lazily evaluated structure with it:
Expr expr(magical_function(expression_text));
then later on evaluate substituting in known values:
evaluate(expr, a, b, c);
I'd want to expand this little DSL later so does something a little more complicated using some non-C++ syntax so I can't simply hardcode my expression the simple way. The use case is that I'll be able to copy and paste the same logic from another module used in a different development area for another language rather than have to adapt it each time to follow C++ syntax.
If someone can get me started on at least how to do the above simple concept of 1 expression and 2 boolean operators that would be really appreciated.
Note: I posted this question due to feedback from another question I posted: How to parse DSL input to high performance expression template. Here I actually wanted an answer to a slightly different problem, but the comments provoked this specific question that I thought was worth posting as the potential answers are really worth documenting.
Disclaimer: I know nothing about metaparse, and very little about proto. The following code is my attempt (mostly via trial and error) to modify this example to do something similar to what you want.
The code can be easily divided in several parts:
typedef token < lit_c < 'a' > > arg1_token; typedef token < lit_c < 'b' > > arg2_token; typedef token < lit_c < 'c' > > arg3_token;
token<Parser>
: Parser
to parse the input and then consumes (and discards) all whitespaces afterwards. The result of the parsing is the result of Parser
. lit_c<char>
: char
and the result of the parsing is that same char. In the grammar this result is overridden by the use of always
.typedef token < keyword < _S ( "true" ), bool_<true> > > true_token; typedef token < keyword < _S ( "false" ), bool_<false> > > false_token;
keyword<metaparse_string,result_type=undefined>
: metaparse_string
(_S("true")
returns metaparse::string<'t','r','u','e'>
which is what metaparse uses internally to do its magic) and the result of the parsing is result_type
.typedef token < keyword < _S ( "&&" ) > > and_token; typedef token < keyword < _S ( "||" ) > > or_token; typedef token < lit_c < '!' > > not_token;
In the case of and_token
and or_token
the result is undefined and in the grammar below it is ignored.
struct paren_exp;
First paren_exp
is forward-declared.
typedef one_of< paren_exp, transform<true_token, build_value>, transform<false_token, build_value>, always<arg1_token, arg<0> >, always<arg2_token, arg<1> >, always<arg3_token, arg<2> > > value_exp;
one_of<Parsers...>
: transform<Parser,SemanticAction>
: Parser
. The result type is the result type of Parser
transformed by SemanticAction
.always<Parser,NewResultType>
:
matches Parser
, returns NewResultType
.
The equivalent spirit rule would be:
value_exp = paren_exp [ _val=_1 ] | true_token [ _val=build_value(_1) ] | false_token [ _val=build_value(_1) ] | argN_token [ _val=phx::construct<arg<N>>() ];
typedef one_of< transform<last_of<not_token, value_exp>, build_not>, value_exp > not_exp;
last_of<Parsers...>
:
last_of matches every one of the Parsers
in sequence and its result type is the result type of the last parser.
The equivalent spirit rule would be:
not_exp = (omit[not_token] >> value_exp) [ _val=build_not(_1) ] | value_exp [ _val=_1 ];
typedef foldl_start_with_parser< last_of<and_token, not_exp>, not_exp, build_and > and_exp; // and_exp = not_exp >> *(omit[and_token] >> not_exp); typedef foldl_start_with_parser< last_of<or_token, and_exp>, and_exp, build_or > or_exp; // or_exp = and_exp >> *(omit[or_token] >> and_exp);
foldl_start_with_parser<RepeatingParser,InitialParser,SemanticAction>
:
this parser combinator matches InitialParser
and then RepeatingParser
multiple times until it fails. The result type is the result of mpl::fold<RepeatingParserSequence, InitialParserResult, SemanticAction>
, where RepeatingParserSequence
is a sequence of the result types of every application of RepeatingParser
. If RepeatingParser
never succeeds the result type is simply InitialParserResult
.
I believe (xd) that the equivalent spirit rule would be:
or_exp = and_exp[_a=_1] >> *( omit[or_token] >> and_exp [ _val = build_or(_1,_a), _a = _val ]);
struct paren_exp: middle_of < lit_c < '(' > , or_exp, lit_c < ')' > > {}; // paren_exp = '(' >> or_exp >> ')';
middle_of<Parsers...>
: Parsers
and the result type is the result of the parser that is in the middle.typedef last_of<repeated<space>, or_exp> expression; //expression = omit[*space] >> or_exp;
repeated<Parser>
: Parser
multiple times. The result is a sequence of the result types of every application of the parser, if the parser fails on its first try the result is an empty sequence. This rule simply removes any leading whitespace.typedef build_parser<entire_input<expression> > function_parser;
This line creates a metafunction that accepts an input string and returns the result of parsing.
Let's look at an example walkthrough of the building of an expression. This is done in two steps: first the grammar constructs a tree that depends on build_or
, build_and
, build_value
, build_not
and arg<N>
. Once you get that type, you can get the proto expression using the proto_type
typedef.
"a || !b"
We start on or_expr
:
or_expr
: We try its InitialParser which is and_expr
. and_expr
: We try its InitialParser which is not_expr
. not_expr
: not_token fails so we try value_expr
. value_expr
: arg1_token succeeds. The return type is arg<0>
and we go back to not_expr
.not_expr
: the return type is not modified at this step. We go back to and_expr
. and_expr
: We try its RepeatingParser, it fails. and_expr succeeds and its return type is the return type of its InitialParser: arg<0>
. We go back to or_expr
. or_expr
: We try its RepeatingParser, or_token matches, we try and_expr
. and_expr
: We try its InitialParser not_expr
.not_expr
: not_token succeeds, we try value_expr
. value_expr
: arg2_token succeeds. The return type is arg<1>
and we go back to not_expr
. not_expr
: the return type is modified by transform using build_not: build_not::apply< arg<1> >. We go back to and_expr
. and_expr
: We try its RepeatingParser, it fails. and_expr succeeds and returns build_not::apply< arg<1> >. We go back to or_expr
. or_expr
: RepeatingParser has succeeded, foldlp uses build_or on build_not::apply< arg<1> >
and arg<0>
, obtaining build_or::apply< build_not::apply< arg<1> >, arg<0> >
. Once we have this tree constructed we get its proto_type
:
build_or::apply< build_not::apply< arg<1> >, arg<0> >::proto_type; proto::logical_or< arg<0>::proto_type, build_not::apply< arg<1> >::proto_type >::type; proto::logical_or< proto::terminal< placeholder<0> >::type, build_not::apply< arg<1> >::proto_type >::type; proto::logical_or< proto::terminal< placeholder<0> >::type, proto::logical_not< arg<1>::proto_type >::type >::type; proto::logical_or< proto::terminal< placeholder<0> >::type, proto::logical_not< proto::terminal< placeholder<1> >::type >::type >::type;
#include <iostream> #include <vector> #include <boost/metaparse/repeated.hpp> #include <boost/metaparse/sequence.hpp> #include <boost/metaparse/lit_c.hpp> #include <boost/metaparse/last_of.hpp> #include <boost/metaparse/middle_of.hpp> #include <boost/metaparse/space.hpp> #include <boost/metaparse/foldl_start_with_parser.hpp> #include <boost/metaparse/one_of.hpp> #include <boost/metaparse/token.hpp> #include <boost/metaparse/entire_input.hpp> #include <boost/metaparse/string.hpp> #include <boost/metaparse/transform.hpp> #include <boost/metaparse/always.hpp> #include <boost/metaparse/build_parser.hpp> #include <boost/metaparse/keyword.hpp> #include <boost/mpl/apply_wrap.hpp> #include <boost/mpl/front.hpp> #include <boost/mpl/back.hpp> #include <boost/mpl/bool.hpp> #include <boost/proto/proto.hpp> #include <boost/fusion/include/at.hpp> #include <boost/fusion/include/make_vector.hpp> using boost::metaparse::sequence; using boost::metaparse::lit_c; using boost::metaparse::last_of; using boost::metaparse::middle_of; using boost::metaparse::space; using boost::metaparse::repeated; using boost::metaparse::build_parser; using boost::metaparse::foldl_start_with_parser; using boost::metaparse::one_of; using boost::metaparse::token; using boost::metaparse::entire_input; using boost::metaparse::transform; using boost::metaparse::always; using boost::metaparse::keyword; using boost::mpl::apply_wrap1; using boost::mpl::front; using boost::mpl::back; using boost::mpl::bool_; struct build_or { typedef build_or type; template <class C, class State> struct apply { typedef apply type; typedef typename boost::proto::logical_or<typename State::proto_type, typename C::proto_type >::type proto_type; }; }; struct build_and { typedef build_and type; template <class C, class State> struct apply { typedef apply type; typedef typename boost::proto::logical_and<typename State::proto_type, typename C::proto_type >::type proto_type; }; }; template<bool I> struct value //helper struct that will be used during the evaluation in the proto context {}; struct build_value { typedef build_value type; template <class V> struct apply { typedef apply type; typedef typename boost::proto::terminal<value<V::type::value> >::type proto_type; }; }; struct build_not { typedef build_not type; template <class V> struct apply { typedef apply type; typedef typename boost::proto::logical_not<typename V::proto_type >::type proto_type; }; }; template<int I> struct placeholder //helper struct that will be used during the evaluation in the proto context {}; template<int I> struct arg { typedef arg type; typedef typename boost::proto::terminal<placeholder<I> >::type proto_type; }; #ifdef _S #error _S already defined #endif #define _S BOOST_METAPARSE_STRING typedef token < keyword < _S ( "&&" ) > > and_token; typedef token < keyword < _S ( "||" ) > > or_token; typedef token < lit_c < '!' > > not_token; typedef token < keyword < _S ( "true" ), bool_<true> > > true_token; typedef token < keyword < _S ( "false" ), bool_<false> > > false_token; typedef token < lit_c < 'a' > > arg1_token; typedef token < lit_c < 'b' > > arg2_token; typedef token < lit_c < 'c' > > arg3_token; struct paren_exp; typedef one_of< paren_exp, transform<true_token, build_value>, transform<false_token, build_value>, always<arg1_token, arg<0> >, always<arg2_token, arg<1> >, always<arg3_token, arg<2> > > value_exp; //value_exp = paren_exp | true_token | false_token | arg1_token | arg2_token | arg3_token; typedef one_of< transform<last_of<not_token, value_exp>, build_not>, value_exp> not_exp; //not_exp = (omit[not_token] >> value_exp) | value_exp; typedef foldl_start_with_parser < last_of<and_token, not_exp>, not_exp, build_and > and_exp; // and_exp = not_exp >> *(and_token >> not_exp); typedef foldl_start_with_parser < last_of<or_token, and_exp>, and_exp, build_or > or_exp; // or_exp = and_exp >> *(or_token >> and_exp); struct paren_exp: middle_of < lit_c < '(' > , or_exp, lit_c < ')' > > {}; //paren_exp = lit('(') >> or_exp >> lit('('); typedef last_of<repeated<space>, or_exp> expression; //expression = omit[*space] >> or_exp; typedef build_parser<entire_input<expression> > function_parser; template <typename Args> struct calculator_context : boost::proto::callable_context< calculator_context<Args> const > { calculator_context ( const Args& args ) : args_ ( args ) {} // Values to replace the placeholders const Args& args_; // Define the result type of the calculator. // (This makes the calculator_context "callable".) typedef bool result_type; // Handle the placeholders: template<int I> bool operator() ( boost::proto::tag::terminal, placeholder<I> ) const { return boost::fusion::at_c<I> ( args_ ); } template<bool I> bool operator() ( boost::proto::tag::terminal, value<I> ) const { return I; } }; template <typename Args> calculator_context<Args> make_context ( const Args& args ) { return calculator_context<Args> ( args ); } template <typename Expr, typename ... Args> int evaluate ( const Expr& expr, const Args& ... args ) { return boost::proto::eval ( expr, make_context ( boost::fusion::make_vector ( args... ) ) ); } #ifdef LAMBDA #error LAMBDA already defined #endif #define LAMBDA(exp) apply_wrap1<function_parser, _S(exp)>::type::proto_type{} int main() { using std::cout; using std::endl; cout << evaluate ( LAMBDA ( "true&&false" ) ) << endl; cout << evaluate ( LAMBDA ( "true&&a" ), false ) << endl; cout << evaluate ( LAMBDA ( "true&&a" ), true ) << endl; cout << evaluate ( LAMBDA ( "a&&b" ), true, false ) << endl; cout << evaluate ( LAMBDA ( "a&&(b||c)" ), true, false, true ) << endl; cout << evaluate ( LAMBDA ( "!a&&(false||(b&&!c||false))" ), false, true, false ) << endl; } /*int main(int argc , char** argv) { using std::cout; using std::endl; bool a=false, b=false, c=false; if(argc==4) { a=(argv[1][0]=='1'); b=(argv[2][0]=='1'); c=(argv[3][0]=='1'); } LAMBDA("a && b || c") expr; cout << evaluate(expr, true, true, false) << endl; cout << evaluate(expr, a, b, c) << endl; return 0; }*/
For long, compile-time parsing has meant using template meta programming - which seems line noise to most beginner and even intermediate C++ programmers.
However, with C++11 we got constexpr and in C++14, a lot of restrictions to constexpr were removed. C++17 even makes a few of the standard library things constexpr.
While trying to learn advanced modern C++ - I decided to write a compile time HTML parser - The idea was to create a fast HTML templating engine.
The entire code can be found here: https://github.com/rep-movsd/see-phit
I will briefly explain the things I learned when getting this to work.
Handling dynamic data structures
I needed to parse a const char* and convert it into a multiway tree - but dynamic allocation is a no-no in constexpr land.
The solution? Use an array of nodes, with indices pointing to children and siblings - essentially how you might do it in FORTRAN!
The caveat is that your list of nodes has to be of a fixed size initially. Keeping it very large seemed to make gcc slow down compilation by a huge amount. If you end up going past the end of an array, the compiler will throw an error. I wrote a small std::array like wrapper that is fully constexpr.
Parsing
Pretty much standard code that you write fro runtime parsing will work at compile time! Loops, recursion, conditionals - everything works perfectly.
One issue was - how to represent strings? Using the above mentioned approach - an array of char - is very memory hungry, tedious way of doing things. Fortunately in my case, all I ever needed was merely substrings of the original const char* input. So I wrote a small constexpr string_view like class that just holds pointers to the start and end of the relevant parsed token. Creating new literal strings is just a matter of making these views into a const char* literal.
Error reporting
The basic way to handle errors in constexpr code is to call a function that is not constexpr - the compiler stops and prints the offending line, which can easily contain an error string.
However I wanted more - I wanted the parser to display the row and the column too. I struggled for a while and ended up thinking it was impossible. But I went back to it and tried every possible thing I could think of. Finally I found a way that makes gcc print 2 numbers and an error description message. It essentially involves creating a template with two integer parameters (row and col) whose values come from the constexpr parser.
Performance
I could not clearly find any pattern as to what kind of constexpr code tends to slow the compiler down, but the default performance is not too shabby at all. I am able to parse a 1000 node HTML file in about 1.5 seconds on gcc.
clang is a bit faster.
I intend to write a more detailed description of how the code works in the wiki of the github repo - stay tuned.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With