I'm trying to design a compiler, and am at lexical analysis. Say I take a simple "Hello World!" program as a file of strings and extract tokens from it. What is the best way to store these tokens? In a single data structure, or two or more data structures depending on the type of token?
Actually, you don't want to store all the tokens, period.
What you want to do is produce the tokens, one at a time, and hand them to the parser. After the parser inspects the token, the token isn't needed anymore. The parser may copy data from the token and use it to build a node in an AST. You can arguably get by with a single global token, although that isn't considered to be good practice, and if your language has a preprocessor that operates over token streams this won't work.
Perhaps the question you intended to ask is, how are the AST nodes stored long term? The answer is that they can be dynamically allocated from the heap, and they are tied together by parent/child links in the AST. That's enough to track them all reliably.
You might consider indexing the AST nodes according to type. For most compiling tasks, this is unnecessary. For some tools, this is useful, as it allows the tool to find various node types in very large trees quickly. YMMV.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With