Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I can't get the string value of a token

I try to implement a Lexer for a little programming language with Boost Spirit.

I have to get the value of a token and I get a bad_get exception :

terminate called after throwing an instance of 'boost::bad_get'
what(): boost::bad_get: failed value get using boost::get Aborted

I obtain this exception when doing :

std::string contents = "void";

base_iterator_type first = contents.begin();
base_iterator_type last = contents.end();

SimpleLexer<lexer_type> lexer;

iter = lexer.begin(first, last);
end = lexer.end();

std::cout << "Value = " << boost::get<std::string>(iter->value()) << std::endl;

My lexer is defined like that :

typedef std::string::iterator base_iterator_type;
typedef boost::spirit::lex::lexertl::token<base_iterator_type, boost::mpl::vector<unsigned int, std::string>> Tok;
typedef lex::lexertl::actor_lexer<Tok> lexer_type;

template<typename L>
class SimpleLexer : public lex::lexer<L> {
    private:

    public:
        SimpleLexer() {
            keyword_for = "for";
            keyword_while = "while";
            keyword_if = "if";
            keyword_else = "else";
            keyword_false = "false";
            keyword_true = "true";
            keyword_from = "from";
            keyword_to = "to";
            keyword_foreach = "foreach";

            word = "[a-zA-Z]+";
            integer = "[0-9]+";
            litteral = "...";

            left_parenth = '('; 
            right_parenth = ')'; 
            left_brace = '{'; 
            right_brace = '}'; 

            stop = ';';
            comma = ',';

            swap = "<>";
            assign = '=';
            addition = '+';
            subtraction = '-';
            multiplication = '*';
            division = '/';
            modulo = '%';

            equals = "==";
            not_equals = "!=";
            greater = '>';
            less = '<';
            greater_equals = ">=";
            less_equals = "<=";

            whitespaces = "[ \\t\\n]+";
            comments = "\\/\\*[^*]*\\*+([^/*][^*]*\\*+)*\\/";

            //Add keywords
            this->self += keyword_for | keyword_while | keyword_true | keyword_false | keyword_if | keyword_else | keyword_from | keyword_to | keyword_foreach;
            this->self += integer | litteral | word;

            this->self += equals | not_equals | greater_equals | less_equals | greater | less ;
            this->self += left_parenth | right_parenth | left_brace | right_brace;
            this->self += comma | stop;
            this->self += assign | swap | addition | subtraction | multiplication | division | modulo;

            //Ignore whitespaces and comments
            this->self += whitespaces [lex::_pass = lex::pass_flags::pass_ignore];
            this->self += comments [lex::_pass = lex::pass_flags::pass_ignore]; 
        }

        lex::token_def<std::string> word, litteral, integer;

        lex::token_def<lex::omit> left_parenth, right_parenth, left_brace, right_brace;

        lex::token_def<lex::omit> stop, comma;

        lex::token_def<lex::omit> assign, swap, addition, subtraction, multiplication, division, modulo;
        lex::token_def<lex::omit> equals, not_equals, greater, less, greater_equals, less_equals;

        //Keywords
        lex::token_def<lex::omit> keyword_if, keyword_else, keyword_for, keyword_while, keyword_from, keyword_to, keyword_foreach;
        lex::token_def<lex::omit> keyword_true, keyword_false;

        //Ignored tokens
        lex::token_def<lex::omit> whitespaces;
        lex::token_def<lex::omit> comments;
};

Is there an other way to get the value of a Token ?

like image 742
Baptiste Wicht Avatar asked Oct 14 '11 08:10

Baptiste Wicht


People also ask

Does strtok work on strings?

The strtok() function parses the string up to the first instance of the delimiter character, replaces the character in place with a null byte ( '\0' ), and returns the address of the first character in the token.

What does strtok return if delimiter not found?

The first call in the sequence searches s for the first character that isn't contained in the current delimiter string sep. If no such character is found, then there are no tokens in s, and strtok() returns a NULL pointer.

Does strtok return NULL?

strtok() returns a NULL pointer. The token ends with the first character contained in the string pointed to by string2. If such a character is not found, the token ends at the terminating NULL character. Subsequent calls to strtok() will return the NULL pointer.

How do you get a second token in strtok?

The strtok() function gets the next token from string s1, where tokens are strings separated by characters from s2. To get the first token from s1, strtok() is called with s1 as its first parameter. Remaining tokens from s1 are obtained by calling strtok() with a null pointer for the first parameter.


1 Answers

You can always use the 'default' token data (which is iterator_range of the source iterator type).

std::string tokenvalue(iter->value().begin(), iter->value().end());

After studying the test cases in the boost repository, I found out a number of things:

  • this is by design
  • there is an easier way
  • the easier way comes automated in Lex semantic actions (e.g. using _1) and when using the lexer token in Qi; the assignment will automatically convert to the Qi attribute type
  • this has (indeed) got the 'lazy, one-time, evaluation' semantics mentioned in the docs

The cinch is that the token data is variant, which starts out as the raw input iterator range. Only after 'a' forced assignment, the converted attribute is cached in the variant. You can witness the transition:

lexer_type::iterator_type iter = lexer.begin(first, last);
lexer_type::iterator_type end = lexer.end();

assert(0 == iter->value().which());
std::cout << "Value = " << boost::get<boost::iterator_range<base_iterator_type> >(iter->value()) << std::endl;

std::string s;
boost::spirit::traits::assign_to(*iter, s);
assert(1 == iter->value().which());
std::cout << "Value = " << s << std::endl;

As you can see, the attribute assignment is forced here, directly using the assign_to trait implementation.

Full working demonstration:

#include <boost/spirit/include/lex_lexertl.hpp>

#include <iostream>
#include <string>

namespace lex = boost::spirit::lex;

typedef std::string::iterator base_iterator_type;
typedef boost::spirit::lex::lexertl::token<base_iterator_type, boost::mpl::vector<int, std::string>> Tok;
typedef lex::lexertl::actor_lexer<Tok> lexer_type;

template<typename L>
class SimpleLexer : public lex::lexer<L> {
    private:

    public:
        SimpleLexer() {
            word = "[a-zA-Z]+";
            integer = "[0-9]+";
            literal = "...";

            this->self += integer | literal | word;
        }

        lex::token_def<std::string> word, literal;
        lex::token_def<int> integer;
};

int main(int argc, const char* argv[]) {
    SimpleLexer<lexer_type> lexer;

    std::string contents = "void";

    base_iterator_type first = contents.begin();
    base_iterator_type last = contents.end();

    lexer_type::iterator_type iter = lexer.begin(first, last);
    lexer_type::iterator_type end = lexer.end();

    assert(0 == iter->value().which());
    std::cout << "Value = " << boost::get<boost::iterator_range<base_iterator_type> >(iter->value()) << std::endl;

    std::string s;
    boost::spirit::traits::assign_to(*iter, s);
    assert(2 == iter->value().which());
    std::cout << "Value = " << s << std::endl;

    return 0;
}
like image 90
sehe Avatar answered Sep 26 '22 01:09

sehe