I"m trying to take a string that represents a full algebraic excpression, such as x = 15 * 6 / 3 which is a string, and tokenize it into its individual components. So the first would be x, then =, then 15, then *, 6, / and finally 3.
The problem I am having is actually parsing through the string and looking at the individual characters. I can't think of a way to do this without a massive amount of if statements. Surely there has to be a better way tan specifically defining each individual case and testing for it.
For each type of token, you'll want to figure out how to identify:
Let's take your example: x=15*6/3
. Let's assume that you cannot rely on the fact that there are spaces in between each token. In that case, it's trivial: your new token starts when you reach a space.
You can break down the character types into letters, digits, and symbols. Let's call the token types Variable, Operator, and Number.
A letter indicates a Variable token has started. It continues until you read a non-letter.
A symbol indicates the start of an Operator token. I only see single symbols, but you can have groups of symbols correspond to different Operator tokens.
A digit indicates the start of a Number token. (Let's assume integers for now.) The Number token continues until you read a non-digit.
Basically, that's how a simple symbolic parser works. Now, if you add in negative numbers (where the '-' symbol can have multiple meanings), or parentheses, or function names (like sin(x)
) then things get more complicated, but it amounts to the same set of rules, now just with more choices.
|
regular expression operator into one big regular expression with capture groups to identify which one matched.or
use a lexer library, such as the one in antlr or javacc
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With