Tokenizer efficiency question

Question

I'm writing a compiler front end for a project and I'm trying to understand what's the best method of tokenize the source code. I can't choose between two ways:

1) the tokenizer read all tokens:

bool Parser::ReadAllTokens()
{
  Token token;
  while( m_Lexer->ReadToken( &token ) )
  {
    m_Tokens->push_back( token );
    token.Reset(); // reset the token values..
  }

  return !m_Tokens->empty();
}

and then the parsing phase begins, operating on the m_Tokens list. In this way the methods getNextToken(), peekNextToken() and ungetToken() are relatively easy to implement by iterator, and the parsing code is well written and clear ( not broken by getNextToken() i.e. :

 getNextToken();
 useToken();
 getNextToken();
 peekNextToken();
 if( peeked is something )
  ungetToken();
 ..
 ..

)

2) the parsing phase begins and when needed, the token is created and used ( the code seems not so clear )

What's the best method??and why??and the efficiency?? thanks in advance for the answers

Jörgen Sigvardsson · Accepted Answer

Traditionally, compiler construction classes teach you to read tokens, one by one, as you parse. The reason for that, is that back in the days, memory resources were scarce. You had kilobytes to your disposal, and not gigabytes as you do today.

Having said that, I don't mean to recommend you to read all tokens in advance, and then parse from your list of tokens. Input is of arbitrary size. If you hog too much memory, the system will become slow. Since it looks like you only need one token in the lookahead, I'd read one at a time from the input stream. The operating system will buffer and cache the input stream for you, so it'll be fast enough for most purposes.

T33C · Answer

It would be better to use something like Boost::Spirit to tokenise. Why reinvent the wheel?

Tokenizer efficiency question

Tags:

c++

parsing

compiler-construction

tokenize

Salv0

2 Answers

Jörgen Sigvardsson

T33C

Recent Activity

Donate For Us

Tokenizer efficiency question

Tags:

c++

parsing

compiler-construction

tokenize

Salv0

2 Answers

Jörgen Sigvardsson

T33C

Related questions

Recent Activity

Donate For Us