I'm looking at writing a lexer using boost::spirit::lex, but all the examples I can find seem to assume that you've read the entire file into RAM first. I'd like to write a lexer that doesn't require the whole string to be in RAM, is that possible? Or do I need to use something else?
I tried using istream_iterator, but boost gives me a compile error unless I use const char* as the iterator types.
e.g. All the examples I can find basically do this:
lex_functor_type< lex::lexertl::lexer<> > lex_functor;
// assumes entire file is in memory
char const* first = str.c_str();
char const* last = &first[str.size()];
bool r = lex::tokenize(first, last, lex_functor,
boost::bind(lex_callback_functor(), _1, ... ));
Also, is it possible to determine line/column numbers from lex tokens somehow?
Thanks!
Spirit Lex works with any iterator as long as it conforms to the requirements of standard forward iterators. That means you can feed the lexer (invoke lex::tokenize()
) with any conforming iterator. For instance, if you want to use a std::istream
, you could wrap it into a boost::spirit::istream_iterator
:
bool tokenize(std::istream& is, ...)
{
lex_functor_type< lex::lexertl::lexer<> > lex_functor;
boost::spirit::istream_iterator first(is);
boost::spirit::istream_iterator last;
return lex::tokenize(first, last, lex_functor,
boost::bind (lex_callback_functor(), _1, ... ));
}
and it would work.
For the second part of your question (related to the line/column number of the input): yes it is possible to track the input position using the lexer. It's not trivial, though. You need to create your own token type which stores the line/column information and use this instead of the predefined token type. Many people have been asking for this, so I might just go ahead and create an example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With