Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tokenizing an algebraic expression in string format

Tags:

java

I"m trying to take a string that represents a full algebraic excpression, such as x = 15 * 6 / 3 which is a string, and tokenize it into its individual components. So the first would be x, then =, then 15, then *, 6, / and finally 3.

The problem I am having is actually parsing through the string and looking at the individual characters. I can't think of a way to do this without a massive amount of if statements. Surely there has to be a better way tan specifically defining each individual case and testing for it.

like image 568
Slicktopher Avatar asked Feb 17 '23 05:02

Slicktopher


2 Answers

For each type of token, you'll want to figure out how to identify:

  • when you're starting to read a particular token
  • if you're continuing to read the same token, or if you've started a different one

Let's take your example: x=15*6/3. Let's assume that you cannot rely on the fact that there are spaces in between each token. In that case, it's trivial: your new token starts when you reach a space.

You can break down the character types into letters, digits, and symbols. Let's call the token types Variable, Operator, and Number.

A letter indicates a Variable token has started. It continues until you read a non-letter.

A symbol indicates the start of an Operator token. I only see single symbols, but you can have groups of symbols correspond to different Operator tokens.

A digit indicates the start of a Number token. (Let's assume integers for now.) The Number token continues until you read a non-digit.

Basically, that's how a simple symbolic parser works. Now, if you add in negative numbers (where the '-' symbol can have multiple meanings), or parentheses, or function names (like sin(x)) then things get more complicated, but it amounts to the same set of rules, now just with more choices.

like image 67
John Avatar answered Feb 18 '23 19:02

John


  1. create regular expression for each possible element: integer, variable, operator, parentheses.
  2. combine them using the | regular expression operator into one big regular expression with capture groups to identify which one matched.
  3. in a loop match the head of the remaining string and break off the matched part as a token. the type of the token depends on which sub-expression matched as described in 2.

or

use a lexer library, such as the one in antlr or javacc

like image 44
necromancer Avatar answered Feb 18 '23 19:02

necromancer